Construction method of deep visual question answering system for the visually impaired

A technology for visual impairment and depth vision, applied in the field of construction of depth vision speech system

Active Publication Date: 2019-11-26
ZHEJIANG UNIV
View PDF6 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

There is also a product called Third Eye, which is mainly able to take pictures to identify objects, and then voice input, which obviously skips the stage of interacting with users, and has very large limitations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Construction method of deep visual question answering system for the visually impaired
  • Construction method of deep visual question answering system for the visually impaired
  • Construction method of deep visual question answering system for the visually impaired

Examples

Experimental program
Comparison scheme
Effect test

Embodiment approach

[0119] 1. Based on Android 5.0 Jelly Bean and above, the application Deep Ask based on the above algorithm was developed.

[0120] 2. Based on our consideration of the operational limitations of the visually impaired, we have carefully designed a simple and practical interaction method. details as follows

[0121] a. Since the blind cannot accurately identify the various parts of the mobile phone screen for precise operations, we use each part of the entire screen as the reaction area for the operation

[0122] b. When a blind user taps and clicks anywhere on the screen, the shooting program of the application will be started. This operation will use the camera module at the bottom of the system to directly call the camera of the mobile phone to take pictures and store the captured pictures in the file system cache of the mobile phone in the form of a common JPEG file. At the right time, the image file will be transferred to the server through the RESTful API so that it can ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for constructing a deep visual question answering system for visually impaired persons. In the training stage of the method, firstly collected pictures and corresponding question and answer texts constitute a training set; for pictures, convolutional neural networks are used to extract features of pictures ;For the question text, use the word vector technology to convert the question into a word vector list, and use it as the input of LSTM to extract the question features; finally, perform the element dot product of the picture and the question feature and then perform a classification operation to obtain the answer prediction value, which is consistent with the training The concentrated answer labels are compared to calculate the loss, and then the model is optimized with the backpropagation algorithm. In the running phase: the client obtains the photos and question texts taken by the user and uploads them to the server. The server inputs the uploaded photos and question texts into the trained model, extracts features in the same way, and outputs the corresponding classifiers. The answer prediction value is fed back to the client; the client feeds back to the user in the form of voice output.

Description

technical field [0001] The present invention relates to the field of visual question answering (Visual Question Answering), a cross field involving both Natural Language Processing (NLP) and Computer Vision (CV), in particular to a deep How to build a visual speech system. Background technique [0002] The visually impaired population accounts for a huge proportion of the world. In addition, according to the National Bureau of Statistics, in 2014 there were about 6-7 million blind people in China, and another 12 million patients with low vision in both eyes. These people will encounter many problems in their daily life. Take the blind people traveling as an example. Although there are blind roads, many blind people dare not walk on the blind roads at all, because the blind roads are often occupied, not to mention the danger of talking on the road. . So they really need some auxiliary equipment to help them "restore the light". The current voice assistants generally only ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F16/583G06F16/332G06F16/33G06K9/00G06N3/04G06N3/08
CPCG06F16/3329G06F16/3343G06F16/3344G06F16/583G06N3/049G06N3/084G06V20/10
Inventor 潘浩杰刘洋周君沛陆家林
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products