Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

51 results about "Perplexity" patented technology

In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.

Chinese image semantic description method combined with multilayer GRU based on residual error connection Inception network

The invention discloses a Chinese image semantic description method combined with multilayer GRU based on a residual error connection Inception network, and belongs to the field of computer vision andnatural language processing. The method comprises the steps: carrying out the preprocessing of an AI Challenger image Chinese description training set and an estimation set through an open source tensorflow to generate a file at the tfrecord format for training; pre-training an ImageNet data set through an Inception_ResNet_v2 network to obtain a convolution network pre-training model; loading a pre-training parameter to the Inception_ResNet_v2 network, and carrying out the extraction of an image feature descriptor of the AI Challenger image set; building a single-hidden-layer neural network model and mapping the image feature descriptor to a word embedding space; taking a word embedding characteristic matrix and the image feature descriptor after secondary characteristic mapping as the input of a double-layer GRU network; inputting an original image into a description model to generate a Chinese description sentence; employing an evaluation data set for estimation through employing the trained model and taking a Perplexity index as an evaluation standard. The method achieves the solving of a technical problem of describing an image in Chinese, and improves the continuity and readability of sentences.
Owner:HARBIN UNIV OF SCI & TECH

Information interactive network-based criminal individual recognition method

The invention belongs to the field of data mining, and relates to an information interactive network-based criminal individual recognition method. The method comprises the following steps of: (1) obtaining a data set which comprises criminal activity contents, and pre-processing the data set; (2) extracting keyword descriptions of criminal topics; (3) determining the number of subjects of a subject model LDA on the basis of perplexity; (4) extracting interactive subjects between individuals in the pre-processed data set on the basis of LDA, wherein the interactive subjects are as follows: an association probability matrix of the interactive subjects and keywords, and an association probability matrix of interactive edges of the interactive subjects; (5) calculating weights of the interactive edges; (6) calculating local criminal suspects of the individuals on the basis of a structure of a weighted information interactive network; and (7) calculating global criminal suspects of the individuals on the basis of a fuzzy K-means cluster and distance density cluster combined method, and recognizing the criminal individuals. The method is independent of prior information, and can be used for analyzing the most possible suspected person according to communication contents so that the case handling efficiency is improved.
Owner:NAT UNIV OF DEFENSE TECH

Error correction method, device and equipment and storage medium

The invention relates to the technical field of artificial intelligence, and discloses an error correction method. The method comprises the following steps: detecting that a to-be-corrected object exists in a text; extracting context contents of the to-be-corrected objects based on the positions of the to-be-corrected objects, inputting the corresponding similar objects into the error correction model according to the context contents and the similar objects to obtain corresponding alternative probabilities of the to-be-corrected objects, and selecting one corresponding object from the alternative probabilities as a replacement object to perform replacement processing on the to-be-corrected objects based on the alternative probabilities. The invention further provides an error correction device and equipment and a storage medium. Information of a to-be-corrected object is predicted based on the to-be-corrected object and context content at the same time; the confusion degree of a language model during semantic recognition can be reduced, so that relatively accurate similar objects are extracted, then the alternative probability of each similar object is calculated based on an errorcorrection model in combination with context content, and a relatively large object is selected from the alternative probability, so that the probability of each character or word is improved, and the final error correction accuracy is also improved.
Owner:PING AN TECH (SHENZHEN) CO LTD

Label classification method and device of corpora, computer equipment and storage medium

The embodiment of the invention relates to the field of artificial intelligence, and provides a label classification method of corpora, which comprises the following steps of: carrying out word segmentation on multiple sections of text data of multiple sections of corpus data to obtain corresponding multiple sections of word segmentation results; inputting the multiple sections of word segmentation results into a probability model, and analyzing the word segmentation results by modeling of the probability model to obtain a plurality of K values; calculating the confusion degree of the plurality of K values, and taking the K value with the minimum confusion degree to obtain a corresponding first-level label; and inputting the corresponding multi-segment word segmentation results into a deformed bidirectional encoder representation model corresponding to the first-level label, and obtaining sub-labels under the first-level label through the deformed bidirectional encoder representation model. In addition, the invention further relates to blockchain technology, and the multiple segments of text data can be stored in the blockchain. The invention further provides a label classificationdevice for the corpora, computer equipment and a storage medium. Label classification accuracy of the corpora is improved.
Owner:CHINA PING AN PROPERTY INSURANCE CO LTD

Text generation model training method, target corpus expansion method and related device

The invention discloses a text generation model training method, a target corpus expansion method and a related device. The training method of the text generation model comprises the following steps: acquiring a sample corpus; performing word segmentation processing on the sample corpus, and generating a statistical language model according to a word segmentation processing result; generating a target text by using a generator of the text generation model; according to the sample corpus, utilizing a discriminator of a text generation model to discriminate the target text, outputting a discrimination result, and obtaining an adversarial loss function according to the discrimination result; acquiring the confusion degree of the target text by utilizing a statistical language model, and determining a penalty term according to the confusion degree; and superposing the confrontation loss function and the penalty term to obtain a target loss function of the text generation model, and training the text generation model by using the target loss function to obtain a trained text generation model. According to the scheme, the training of the text generation model can be guided by utilizing the existing corpus, and the performance of the text generation model is improved.
Owner:ZHEJIANG DAHUA TECH CO LTD

Pre-training language model-oriented privacy disclosure risk assessment method and system

The invention relates to the field of privacy security, and aims to provide a pre-training language model-oriented privacy disclosure risk assessment method and system. Comprising the following steps: adding forged data into a pre-training data set; inputting the pre-training data set into the initialized neural network model, and calculating loss according to a set pre-training task and a loss function; parameters of the model are continuously updated in the training process, and the privacy leakage risk of the model is increased; inputting the fine tuning data set into a pre-trained neural network model, and performing fine tuning on the feature extraction capability of the model; privacy prefix content is input into the model, and text information serving as a prediction result is output; and calculating, counting and sorting the confusion of the output information, and evaluating the risk of privacy data leakage by comparing the proportion of the generated privacy information. According to the method, the accuracy of evaluating the privacy data leakage risk can be effectively improved, the privacy data leakage risk existing in the pre-training language model is exposed, and a thought is provided for subsequent development of related defense methods.
Owner:ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products