Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

67 results about "Semantic alignment" patented technology

Attention mechanism-based multi-modal emotion feature learning and recognition method

The invention relates to an attention mechanism-based multi-modal emotion feature learning and recognition method, and the method comprises the steps: carrying out the feature extraction of an audio and text sample, and obtaining an FBank acoustic feature and a word vector feature; taking the obtained features as original input features of an audio emotion feature encoder and a text emotion feature encoder respectively, and extracting emotion semantic features of different modes through the encoders; performing audio attention, modal jump attention and text attention learning on the obtained emotion semantic features respectively, and extracting four complementary emotion features including an audio feature with remarkable emotion, an audio feature with semantic alignment, a text feature with semantic alignment and a text feature with remarkable emotion; and fusing the four features and then classifying to obtain corresponding emotion categories. According to the invention, the problemof low emotion recognition rate caused by intra-modal emotion irrelevant factors and inter-modal emotion semantics inconsistency in traditional multi-modal emotion recognition is solved, and the multi-modal emotion recognition accuracy can be effectively improved.
Owner:JIANGSU UNIV

Multi-modal emotion recognition method based on attention enhancing mechanism

The invention belongs to the technical field of emotion calculation and relates to a multi-modal emotion recognition method based on an attention enhancement mechanism. The method comprises steps of obtaining a voice coding matrix through a multi-head attention mechanism, and obtaining a text coding matrix through a pre-trained BERT model; performing point multiplication on the coding matrixes ofthe voice and the text respectively to obtain alignment matrixes of the voice and the text, and calibrating the alignment matrixes with original modal coding information to obtain more local interaction information; and finally, splicing the coding information, the semantic alignment matrix and the interaction information of each mode as features to obtain a feature matrix of each mode; aggregating the voice feature matrix and the text feature matrix by using a multi-head attention mechanism; converting the aggregated feature matrix into vector representation through an attention mechanism; and splicing the vector representations of the voice and the text, and obtaining a final emotion classification result by using a full connection network. According to the method, a problem of multi-modal interaction is solved, and accuracy of multi-modal emotion recognition is improved.
Owner:HANGZHOU DIANZI UNIV

Knowledge graph question and answer training and application service system with automatically generated template

The invention discloses a knowledge graph question and answer training system with an automatic template generation function, and the system comprises a predicate dictionary and category dictionary construction module which is used for constructing a predicate dictionary and a category dictionary in a remote supervision mode; a backbone query generation module which is used for obtaining sub-graphs of the theme entity and the answer entity of each training question and answer pair in the knowledge graph, and using variables to replace answer nodes in the sub-graphs to form backbone queries; asemantic alignment module which is used for aligning question phrases with backbone query semantic elements by using dependency syntactic analysis and a shaping linear alignment technology; a templateubiquitous module which is used for storing the dependency syntax tree, the backbone query and the corresponding relationship into a template library as templates; and a sorting model training modulewhich is used for performing classified learning on every two matching templates by using a machine learning binary classifier according to the matching degree to obtain a question template sorting model, so that the problems of high labor cost and low problem coverage rate in the prior art are solved.
Owner:来康生命科技有限公司

Cross-language text representation method and device

The invention provides a cross-language text representation method and device, and the method comprises the steps: obtaining a first training text and a first cross-language representation model corresponding to a first language, and enabling the first cross-language representation model to comprise a first universal vector sub-model and a text representation sub-model; obtaining a second trainingtext of a target language corresponding to the to-be-processed text; training a first universal vector sub-model according to the first training text and the second training text to obtain a second universal vector sub-model; and obtaining a second cross-language representation model of the target language according to the second universal vector sub-model and the text representation sub-model. Therefore, the universal vectors among different languages are mined based on semantic alignment processing, and cross-language text processing is performed based on the universal vectors, so that therepresentation effect of the cross-language processing model is ensured. The technical problem that in the prior art, a cross-language processing model difficultly crosses obstacles of different languages, and consequently the representation effect is poor is solved.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Training method and device for multi-language semantic representation model, equipment and storage medium

The invention discloses a training method and device for a multi-language semantic representation model, electronic equipment and a storage medium, and relates to the field of natural language processing based on artificial intelligence. According to the specific implementation scheme, the method comprises the steps of: adopting a plurality of training corpora containing multiple languages to train a multi-language semantic representation model, so that the multi-language semantic representation model learns the semantic representation capacity of the various languages; for each training corpus in the plurality of training corpuses, generating a corresponding hybrid language corpus, the hybrid language corpus comprising corpuses of at least two languages; and training the multi-language semantic representation model by adopting each hybrid language corpus and the corresponding training corpus, so that the multi-language semantic representation model learns semantic alignment information of different languages. According to the technical scheme, the multi-language semantic representation model can learn semantic alignment information between different languages, semantic interactionbetween different languages can be achieved on the basis of the multi-language semantic representation model, and practicability is very high.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Visual question-answering method and system based on semantic alignment and storage medium

The invention provides a visual question-answering method and system based on semantic alignment and a storage medium, and relates to the technical field of visual question-answering. According to theembodiment of the invention, the method comprises the steps: firstly obtaining and preprocessing a data set, extracting original image features and target position features according to an original image, generating an image description statement according to the target position features, obtaining an image description word, question features and image description statement features, and carryingout the semantic alignment of the original image features and the image description word; and obtaining a first image feature, obtaining a second image feature according to the original image featureand the image description statement feature, obtaining a third image feature according to the original image feature and the question feature, fusing the three image features, the image description statement feature and the question feature to obtain a comprehensive feature, and predicting a final answer result. Therefore, the importance of the image information is highlighted, the information involved in the feature fusion process is perfected, and the finally generated answer result is more accurate.
Owner:HEFEI UNIV OF TECH

Dynamic knowledge graph representation learning method and system based on anchor points

The invention provides a dynamic knowledge graph representation learning method and system based on anchor points, and the method comprises the steps: firstly finding key entities which play a role insupporting global information in an existing knowledge graph, and building a base coordinate system through the vectors of the entities; secondly, performing semantic alignment, including entity alignment and relationship fusion, on the newly added knowledge and the existing knowledge graph; finally, carrying out representation learning under a base coordinate system, so that only newly-added knowledge and related local knowledge of an existing knowledge graph need to be combined for training, a new knowledge entity is placed at a proper position in a knowledge space, and self-adaptive growthof the dynamic knowledge graph is achieved. The method has the beneficial effects that text information of entities and relationships is used as a semantic basis, and an information basis of knowledge fusion is provided, so that entity alignment and relationship fusion are more comprehensive and sufficient; a word2vec vector generation model is utilized to convert text information of entities andrelations into a vector form, so that the text information is used for mathematical operation.
Owner:CHINA UNIV OF GEOSCIENCES (WUHAN)

Fine-grained image weak supervision target positioning method based on deep learning

The invention relates to a fine-grained image weak supervision target positioning method based on deep learning. The fine-grained image weak supervision target positioning method is used for solving the problem that only weak supervision language description information easy to collect is used for recognizing and positioning a fine-grained image. According to the fine-grained image weak supervision target positioning method, inter-modal fine-grained semantic alignment is directly carried out on the pixel level of the image and the word described by the language; the image is input into a convolutional neural network to extract a feature vector, and the language description is encoded to extract the feature vector of the language description; and feature matching is performed on the convolution feature map and the language description feature vector, and the feature matching map is processed to obtain a saliency map of the target to obtain a final positioning result according to the feature matching map. According to the fine-grained image weak supervision target positioning method, weak supervision target positioning of the fine-grained image is realized under the condition of notneeding a strongly supervised annotated bounding box.
Owner:BEIJING UNIV OF TECH

Video description method based on target space semantic alignment

The invention discloses a video description method based on target space semantic alignment. According to the method, firstly, appearance features and action features are extracted from sampled video frames containing text description, and the appearance features and the action features are spliced and then input into a time sequence Gaussian mixture cavity convolution encoder to obtain time sequence Gaussian features; constructing a decoder by using two layers of long and short-term memory neural networks to obtain probability distribution and hidden vectors of generated statements; establishing a semantic reconstruction network and calculating semantic reconstruction loss; and optimizing the model by using a stochastic gradient descent algorithm, sequentially carrying out the steps on a new video to obtain statement generation probability distribution, and obtaining a video description statement by using a greedy search algorithm. According to the method, modeling is carried out on the long-term time sequence relation of the video by using the time sequence Gaussian mixture cavity convolution, and the statement-level probability distribution difference is obtained through the semantic reconstruction network, so that the semantic gap between the generated statement and the video content can be reduced, and the natural statement which more accurately describes the video content is generated.
Owner:HANGZHOU DIANZI UNIV

Three-dimensional face recognition method based on semantic alignment multi-region template fusion

The invention provides a three-dimensional face recognition method based on semantic alignment multi-region template fusion, which comprises the following steps of: 1, determining a data set of registered faces and test faces in a three-dimensional face database; 2, preprocessing all registered and to-be-identified three-dimensional face models, and performing dense alignment on the preprocessed three-dimensional face models and the reference model; 3, pre-dividing the face region into a plurality of template regions which do not contain expression influence and can be overlapped; 4, for eachtemplate area, directly calculating a similarity value between the template areas on the three-dimensional structure of the human face; And 5, independently voting each region according to the similarity value, synthesizing a plurality of region matching results, and determining a final matching result by adopting a majority voting mode. According to the face recognition method provided by the invention, similarity prediction is carried out by utilizing mutual independence of the multi-template areas, the dependence of an algorithm on accurate division of a single area is reduced, and meanwhile, a multi-area template common voting strategy is adopted, so that certain robustness is also achieved on expressions and other factors influenced by the areas.
Owner:JIAXING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products