Method for constructing deep visual Q&A system for visually impaired persons

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A visual barrier, deep vision technology, applied in the field of deep vision speech system construction

Active Publication Date: 2017-07-14

ZHEJIANG UNIV

View PDF6 Cites 18 Cited by

Summary
Abstract
Description
Claims
Application Information

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology

Problems solved by technology

There is also a product called Third Eye, which is mainly able to take pictures to identify objects, and then voice input, which obviously skips the stage of interacting with users, and has very large limitations

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Image

Smart Image Click on the blue labels to locate them in the text.

Viewing Examples

Smart Image

Examples

Experimental program

Comparison scheme

Effect test

Embodiment approach

[0119] 1. Based on Android 5.0 Jelly Bean and above, the application Deep Ask based on the above algorithm was developed.

[0120] 2. Based on our consideration of the operational limitations of the visually impaired, we have carefully designed a simple and practical interaction method. details as follows

[0121] a. Since the blind cannot accurately identify the various parts of the mobile phone screen for precise operations, we use each part of the entire screen as the reaction area for the operation

[0122] b. When a blind user taps and clicks anywhere on the screen, the shooting program of the application will be started. This operation will use the camera module at the bottom of the system to directly call the camera of the mobile phone to take pictures and store the captured pictures in the file system cache of the mobile phone in the form of a common JPEG file. At the right time, the image file will be transferred to the server through the RESTful API so that it can ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

PUM

Login to View More

Abstract

The present invention discloses a method for constructing a deep visual Q&A system for visually impaired persons. In the training phase, the method comprises: taking collected pictures and a corresponding Q&A text to constitute a training set; extracting picture features for the pictures by using the convolutional neural network; for a question text, converting questions into a word vector list by using the word vector technique, and taking the word vector list as input of the LSTM so as to extract question features; and finally, after carrying out element dot product on the pictures and the question features, carrying out classification on the pictures and the question features so as to obtain an answer prediction value, comparing the answer prediction value with an answer tag in the training set, calculating the loss, and using the back propagation algorithm to optimize the model. In the running phase, the method comprises that: a client obtains photos taken by the user and the question text, and uploads the photos and the question text to a server; the server inputs the uploaded photos and question text into a trained model, extracts question features by using the same manner, outputs a corresponding answer prediction value by using a classifier, and returns the answer prediction value to the client; and the client returns the answer prediction value to the user in a form of voice input.

Description

technical field [0001] The present invention relates to the field of visual question answering (Visual Question Answering), a cross field involving both Natural Language Processing (NLP) and Computer Vision (CV), in particular to a deep How to build a visual speech system. Background technique [0002] The visually impaired population accounts for a huge proportion of the world. In addition, according to the National Bureau of Statistics, in 2014 there were about 6-7 million blind people in China, and another 12 million patients with low vision in both eyes. These people will encounter many problems in their daily life. Take the blind people traveling as an example. Although there are blind roads, many blind people dare not walk on the blind roads at all, because the blind roads are often occupied, not to mention the danger of talking on the road. . So they really need some auxiliary equipment to help them "restore the light". The current voice assistants generally only ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine

Login to View More

Application Information

Patent Timeline

Login to View More

IPC IPC(8): G06F17/30G06K9/00G06N3/04G06N3/08

CPCG06F16/3329G06F16/3343G06F16/3344G06F16/583G06N3/049G06N3/084G06V20/10

Inventor 潘浩杰刘洋周君沛陆家林

Owner ZHEJIANG UNIV

Method for constructing deep visual Q&A system for visually impaired persons

What is Al technical title? Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document. A visual barrier, deep vision technology, applied in the field of deep vision speech system construction

AI Technical Summary This helps you quickly interpret patents by identifying the three key elements: Problems solved by technologyMethod usedBenefits of technology

Problems solved by technology

Method used

Image

Examples

Embodiment approach

PUM

Abstract

Description

Claims

Application Information

What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A visual barrier, deep vision technology, applied in the field of deep vision speech system construction

AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology