Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

145 results about "Polysemy" patented technology

Polysemy (/pəˈlɪsɪmi/ or /ˈpɒlɪsiːmi/; from Greek: πολύ-, polý-, "many" and σῆμα, sêma, "sign") is the capacity for a sign (such as a word, phrase, or symbol) to have multiple meanings (that is, multiple semes or sememes and thus multiple senses), usually related by contiguity of meaning within a semantic field. Polysemy is thus distinct from homonymy—or homophony—which is an accidental similarity between two words (such as bear the animal, and the verb to bear); while homonymy is often a mere linguistic coincidence, polysemy is not.

Probabilistic information retrieval based on differential latent semantic space

A computer-based information search and retrieval system and method for retrieving textual digital objects that makes full use of the projections of the documents onto both the reduced document space characterized by the singular value decomposition-based latent semantic structure and its orthogonal space. The resulting system and method has increased robustness, improving the instability of the traditional keyword search engine due to synonymy and / or polysemy of a natural language, and therefore is particularly suitable for web document searching over a distributed computer network such as the Internet.
Owner:SUNFLARE CO LTD

System and method for automatically discovering a hierarchy of concepts from a corpus of documents

The invention is a method, system and computer program for automatically discovering concepts from a corpus of documents and automatically generating a labeled concept hierarchy. The method involves extraction of signatures from the corpus of documents. The similarity between signatures is computed using a statistical measure. The frequency distribution of signatures is refined to alleviate any inaccuracy in the similarity measure. The signatures are also disambiguated to address the polysemy problem. The similarity measure is recomputed based on the refined frequency distribution and disambiguated signatures. The recomputed similarity measure reflects actual similarity between signatures. The recomputed similarity measure is then used for clustering related signatures. The signatures are clustered to generate concepts and concepts are arranged in a concept hierarchy. The concept hierarchy automatically generates query for a particular concept and retrieves relevant documents associated with the concept.
Owner:MICRO FOCUS LLC

Inverse inference engine for high performance web search

An information retrieval system that deals with the problems of synonymy, polysemy, and retrieval by concept by allowing for a wide margin of uncertainty in the initial choice of keywords in a query. For each input query vector and an information matrix, the disclosed system solves an optimization problem which maximizes the stability of a solution at a given level of misfit. The disclosed system may include a decomposition of the information matrix in terms of orthogonal basis functions. Each basis encodes groups of conceptually related keywords. The bases are arranged in order of decreasing statistical relevance to a query. The disclosed search engine approximates the input query with a weighted sum of the first few bases. Other commercial applications than the disclosed search engine can also be built on the disclosed techniques.
Owner:FIVER LLC

Ontology mapper

Systems, methods and computer-readable media are provided for facilitating patient health care by providing discovery, validation, and quality assurance of nomenclatural linkages between pairs of terms or combinations of terms in databases extant on multiple different health information systems that do not share a set of unified codesets, nomenclatures, or ontologies, or that may in part rely upon unstructured free-text narrative content instead of codes or standardized tags. Embodiments discover semantic structures existing naturally in documents and records, including relationships of synonymy and polysemy between terms arising from disparate processes, and maintained by different information systems. In some embodiments, this process is facilitated by applying Latent Semantic Analysis in concert with decision-tree induction and similarity metrics. In some embodiments, data is re-mined and regression testing is applied to new mappings against an existing mapping base, thereby permitting these embodiments to “learn” ontology mappings as clinical, operational, or financial patterns evolve.
Owner:CERNER INNOVATION

Theme map expansion based knowledge resource organizing method

The invention discloses a theme map expansion based knowledge resource organizing method, which is characterized by including: organizing resources to build resource indexes, and providing a search architecture based on themes on the basis of a map architecture formed by knowledge resources and theme incidence relations therein. In the search architecture, themes of documents are organized to be index entry, document collection is divided by combining internal relations of succession, correlation, polysemy of the document themes and multi-label classifying technology, proper partition is selected by a partition selecting method based on threshold value so as to guide building and inquiring of the indexes. During inquiring, inquiry results are acquired from a route to the relevant index partition and are gathered and organized according to the theme relations, and resource utilization rate and inquiring efficiency are improved sufficiently on the premise of guaranteeing quality of the inquiry quality. Further, by the index partition gathering and breaking technology, the index structure is further optimized and quality of the inquiry results and inquiring efficiency are improved.
Owner:XI AN JIAOTONG UNIV

Ontology mapper

Systems, methods and computer-readable media are provided for facilitating patient health care by providing discovery, validation, and quality assurance of nomenclatural linkages between pairs of terms or combinations of terms in databases extant on multiple different health information systems that do not share a set of unified codesets, nomenclatures, or ontologies, or that may in part rely upon unstructured free-text narrative content instead of codes or standardized tags. Embodiments discover semantic structures existing naturally in documents and records, including relationships of synonymy and polysemy between terms arising from disparate processes, and maintained by different information systems. In some embodiments, this process is facilitated by applying Latent Semantic Analysis in concert with decision-tree induction and similarity metrics. In some embodiments, data is re-mined and regression testing is applied to new mappings against an existing mapping base, thereby permitting these embodiments to “learn” ontology mappings as clinical, operational, or financial patterns evolve.
Owner:CERNER INNOVATION

Chinese relationship extraction method

The invention provides a Chinese relationship extraction method, which comprises the following steps of S1, data preprocessing: performing pre-training processing of multi-granularity information on atext of input data to extract distributed vectors of three levels of characters, words and word meanings in the text; S2, feature coding: taking a bidirectional long-short-term memory network as a basic framework, obtaining hidden state vectors of the characters and hidden state vectors of the words through the distributed vectors of the three levels of the characters, the words and the word meanings, and then obtaining final hidden state vectors of the character level; and S3, relationship classification: learning the final hidden state vector of the word level, and fusing the hidden state vector of the word level into a sentence-level hidden state vector by adopting an attention mechanism of the word level. The problems of word segmentation ambiguity and polysemy ambiguity are effectively solved, the performance of the model on a relation extraction task is greatly improved, and the accuracy and robustness of Chinese relation extraction are improved.
Owner:SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

Word multi-prototype vector representation and word sense disambiguation method based on CRP clustering

The invention discloses a word multi-prototype vector representation and word sense disambiguation method based on CRP clustering, which comprises the following steps: the text in the massive text corpus is purified and pretreated to obtain plain text, CRP algorithm is used to cluster the context window representation of target polysemous word in the text corpus set. The target polysemous words inthe text corpus set are marked according to the clustering classification, and the polysemous words are trained on the marked text corpus set to obtain the multi-prototype vector representation of the polysemous words; 2, the target short text is preprocessed to obtain a short text word sequence, a target polysemous word in a word sequence is identifued, the contextual window of the target polysemous words is used to represent the similarity between the centroids of clusters corresponding to the words in the text corpus, and the word vector corresponding to the maximum similarity clusters isused as the word vector representation of the specific meaning of the polysemous words in the context to disambiguate the meanings of the polysemous words. The invention solves the problem of polysemyexpression in word expression and the problem of ambiguity identification in word meaning expression.
Owner:NORTH CHINA UNIV OF WATER RESOURCES & ELECTRIC POWER

Online public opinion text information sentiment polarity classification processing system and method

The invention belongs to the technical field of computer science, and discloses an online public opinion text information emotion polarity classification processing system and method, the online public opinion text emotion polarity is widely applied to a public opinion monitoring system, however, a feature engineering extraction module of a traditional machine learning method is large in text information loss, and the accuracy of a classification model is not high enough. The method comprises the steps of preprocessing data; the method comprises the following steps of: constructing a word vector in a way of pre-training a model fin-tuning through BERT; the BERT model calculates the correlation between the characters in the sentence and each of the other characters; the constructed word vector can better solve the problems of'one-word polysemy 'and'synonym' of Chinese; the loss of word vector representation is greatly reduced; in the classification model, firstly Bi-LSTM is used for effectively learning context information, then Attention is used for capturing main semantic information and effectively filtering valuable public opinion information, finally softmax classification is used, and the performance of an obtained public opinion text emotion polarity classification result is better than that of a current mainstream algorithm.
Owner:XIDIAN UNIV

Improved text similarity solving algorithm based on semantic analysis

The invention discloses an improved text similarity solving algorithm based on semantic analysis. The algorithm comprises the steps of performing word segmentation and stop word removing processing on two texts; computing weights of the words in the texts based on an improved information theory method; acquiring weights of positions and properties of the words according to the positions and the properties of the words; constructing a target function shown in the description of the extracted text words according to the abovementioned three factors; and at last respectively reducing dimensions of the two feature words according to a semantic similarity, thus acquiring two feature word vectors, and then solving the text similarity sim (W1, W2) between the texts (W1, W2) according to a Pearson correlation coefficient. Compared with traditional text similarity computing method, the algorithm provided by the invention has higher accuracy, wider application range and higher application value, can accurately compute contribution degrees of the different words to a text thought and solve the problems of polysemy and synonym, is more accordant with an empirical value, and meanwhile provides a good theoretical basis for subsequent text clustering.
Owner:SICHUAN YONGLIAN INFORMATION TECH CO LTD

Text classification method, server and computer readable medium

The embodiment of the invention discloses a text classification method, a server and a computer readable medium. The method comprises the steps that a to-be-classified text is acquired, and the to-be-classified text comprises M words, wherein M is a positive integer; according to N words of the to-be-classified text, a topic corresponding to each of the N words is obtained through a topic model, wherein N is a positive integer not greater than M; according to each of the N words and the topic corresponding to each word, a topic word vector corresponding to each of the N words is obtained through a topic word vector model, wherein the topic word vectors are common vector representations of the words and the topics corresponding to the words; according to the topic word vector correspondingto each of the N words, the category of the to-be-classified text is obtained through a classification model; and the topic model, the topic word vector model and the classification model are trainedmodels. Through the text classification method, the server and the computer readable medium, under the condition of considering polysemy of the words, classification accuracy is enhanced, and text classification is performed efficiently.
Owner:SHENZHEN GIONEE COMM EQUIP

Method and system for sequencing search entries

The invention provides a method and a system for sequencing search entries. The method comprises the following steps: expressing an inquiry text into a vector according a word order through a neural network; computing sequencing scores between the inquiry text and the search entries according to the expressed vector through the neural network; sequencing the search entries according to the computed sequencing scores. By adopting the method and the system, search sequencing can be performed under the consideration of matching of polysemy and synonyms and the word order of words, and more accurate sequencing results can be given.
Owner:BAIDU ONLINE NETWORK TECH (BEIJIBG) CO LTD

Deep information based sign language recognition method

The invention discloses a deep information based sign language recognition method. The method comprises steps of: (1) identification of a single gesture: dividing a sign language into a hand shape and a motion track; using deep information based multi-threshold hand gesture segmentation, and obtaining a feature value of the hand shape by using an improved SURF algorithm; obtaining the feature value of the motion track by using angular velocity and distance based motion characteristics, and performing gesture identification by using extracted feature value of the hand shape and the feature value of the motion track as an input of BP neural network; and (2) correction of a gesture sequence: according to the recognized gesture, performing automatic reasoning correction on gestures that have not been correctly recognized or that have polysemy by using a Bayesian algorithm. According to the method provided by the invention, the hand gesture segmentation is performed by using the deep information obtained by a Kinect camera, thereby overcoming the interference caused by illumination in the conventional vision based hand gesture segmentation, and improving naturality of human-computer interaction. The use of improved SURF algorithm reduces the calculation amount and improves the identification speed.
Owner:SHANDONG UNIV

Chinese POI matching method based on natural language understanding

The invention discloses a Chinese POI (Point Of Interest) matching method based on natural language understanding, which is used for solving the problem of low Chinese POI matching precision caused byfrequent incomplete Chinese addresses, inconsistent Chinese character expressions and one-word polysemy of multi-source Chinese POI data. According to the method, six similarity measurement characteristics of geometric positions, category structures, POI names and address attributes are integrated and input into the deep neural network to calculate the matching probability, so that the influenceof subjective factors caused by artificial weight allocation is avoided; a natural language understanding technology is introduced to fully consider the semantic relationship between address attributes and name attributes in the similarity calculation process of the six attribute characteristics to overcome the defects of a traditional algorithm; short text similarity calculation is improved by using a neural network of a twin attention mechanism, and POI data accurate matching is achieved in combination with grammar, semantics and spatial similarity calculation.
Owner:安徽迪科数金科技有限公司

Emotion classification method based on part-of-speech combination and feature selection

The invention discloses an emotion classification method based on part-of-speech combination and feature selection. The method comprises the following steps of firstly, initializing word-part-of-speech Word2vec model; secondly, carrying out preprocessing operation on data, and selecting feature words with emotion information from preprocessed data based on an emotion dictionary; thirdly, combiningthe feature words and the part-of-speech of texts, and converting the texts into word and part-of-speech pair sequence texts; fourthly, obtaining vectors of the feature words of the word and part-of-speech pair sequence texts through the word-part-of-speech Word2vec model, and performing addition and averaging on the vectors of the words according to the dimensions for the texts to represent thetexts, thereby obtaining eigenvectors of the texts; and finally, obtaining an emotion classification model by utilizing an SVM classifier. The method has the beneficial effects that the emotion dictionary is used for extracting the feature words, and the feature words with the single emotion information are highlighted; and on the other hand, a phrase structure of emotional tendency is extracted based on phrase structure optimization and word segmentation, and the words and the part-of-speech are combined to solve the problem that one word has multiple meanings.
Owner:NANTONG UNIVERSITY

A multimedia resource recommendation method and system

The embodiments of the invention provide a multimedia resource recommendation method and system. The method comprises the steps of establishing a text vector matrix and a tag vector matrix corresponding to a multimedia library; acquiring a text reduction matrix of the text vector matrix and a tag reduction matrix of the tag vector matrix; when determining a current multimedia file played by a user, determining the multimedia similarity degrees between the current multimedia file and other multimedia files in the multimedia library according to the text reduction matrix and the tag reduction matrix; according to the multimedia similarity degrees corresponding to the multimedia files in the multimedia library, determining the multimedia files to be recommended to users as to-be-recommended resources. The method solves the problem of cold boot in a multimedia resource recommendation process and solves the problem of influence of text synonyms and polysemy on multimedia file similarity degree calculation in a multimedia resource recommendation process, so that the multimedia file matching degree is increased and the accuracy of multimedia resource recommendation is improved.
Owner:GUANGZHOU SHIYUAN ELECTRONICS CO LTD

Case-related news viewpoint sentence recognition method based on BERT and BiLSTM-Attention

The invention discloses a case-related news viewpoint sentence recognition method based on BERT and a BiLSTM-Attention model, and the method comprises the steps: firstly, carrying out the preprocessing of a news text, including word segmentation and duplication removal; then, coding words in the text into vectors through BERT to obtain text features, and coding case elements corresponding to all sentences into vectors to obtain case information; splicing the feature vectors, and inputting the spliced feature vectors into the BiLSTM to obtain past and future features and time sequence information; enabling the output of the layer to be related to case elements through Attention to pay attention to important information, and finally judging whether sentences are viewpoint sentences or not through a softmax classifier. According to the method, BiLSTM is added, so more sentence semantic information can be obtained. Meanwhile, case elements are fused to obtain more case domain information,an Attention mechanism is introduced to associate the case elements, and more important information for a viewpoint sentence recognition task is paid attention to. The word vector generated by using the BERT is dynamic, and compared with a general word2vec word vector, the word vector generated by using the BERT can solve the problem of one word with multiple meanings.
Owner:KUNMING UNIV OF SCI & TECH

Reaction type search method and contents correlation technique based on contents relativity

The invention discloses a reaction type search method and a context correlation method based on context correlation. The method comprises the following steps: when a enquiry request is received, a original enquiry result set is generated by using a main search engine; after current user sees a enquiry result and point-hits a target web, the target web's ID is obtained, and relativity of all web in the original enquiry result set and the target web is queried from a web relative matrix K; and a web which has greatest relativity with the target web is used as a new enquiry result to submit the user. Comparing with prior art, the invention avoids to learn complex ranking function of query-sensitive, cancels search class concept, and replaces with web grade relative analysis to a solve grain size-ascription problem of category classification; the method does not need an action of tracking a particular user in long term comparing with a configured file tracking method based on user selfhood; comparing with a direct optimization search result's method based on point-hitting data, the method can effectively solve problems such as one meaning with two or more words and one word having two or more meanings.
Owner:TIANJIN UNIV

Automatic extraction method for text document theme word meaning

The invention relates to an automatic extraction method for a text document theme word meaning, which comprises the following steps of: firstly, performing text document preprocessing on a training text document set and a testing text document set to obtain a candidate theme word meaning set of text documents in the training text document set and the testing text document set; then, calculating a characteristic attribute value of each candidate theme word meaning; and finally, extracting a final theme word meaning of each text document in the testing text document set by using a Bayesian model. The whole process for extracting the theme meaning by using word-meaning substituting words avoids inaccuracy caused by polysemy, and the method can improve the extraction precision of the theme meaning.
Owner:COMTEC SOLAR JIANGSU +1

Word definition generation method based on recurrent neural network and latent variable structure

The invention relates to a word definition generation method based on a recurrent neural network and a latent variable structure in the field of natural language processing. On the basis of a recurrent neural network, a variational auto-encoder (VAE) is used for modeling paraphrases, latent variable characteristics are combined. The paraphrases are extracted according to context information of defined words to generate paraphrases of the words, and the method specifically comprises the steps of establishing and arranging a basic corpus; selecting a synonym set of the defined words, and expanding the basic corpus to form a final corpus; carrying out expansion reconstruction on the word vectors of the defined words; constructing a structure model based on the recurrent neural network and thelatent variable; training a latent variable structure model based on a recurrent neural network; and inputting the to-be-paraphrased words and the context information thereof into the trained model to realize semantic paraphrasing of the to-be-paraphrased words in a specific context, thereby achieving polysemy.
Owner:BEIJING UNIV OF TECH

Translation providing method, device and system

ActiveCN103699528AMeet translation needsImprove translation experienceSpecial data processing applicationsMutual informationLanguage type
The invention provides a translation providing method, device and system. The method includes: receiving a translation request transmitted by a client, and acquiring the current position information of the client, wherein the translation request includes to-be-translated contents and the target language type; acquiring the map data and a preset mutual information set corresponding to the target language type; acquiring the position features of the to-be-translated contents according to the current position information, the map data and the preset mutual information set; acquiring the translation of the to-be-translated contents according to the position features and a preset translation model, and transmitting the translation to the client. By the method, the acquired translation can satisfy the translation requirements, at specific positions, of a user, the translation better satisfies the expectation of the user, especially for polysemy, accurate translation can be provided for the user fast, and the translation experience of the user is improved greatly.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Two-stage semantic word vector generation method

ActiveCN111027595AHigh quality vectorSolve the defect that only corresponds to one word vectorSemantic analysisCharacter and pattern recognitionFeature extractionSemantics
The invention provides a two-stage semantic word vector generation method. The two-stage semantic word vector generation method comprises the following five steps of: performing text matrixizing; constructing a feature extractor; performing semantic recognition; constructing a neural language model; and generating a semantic term vector. According to the method, the corresponding word vectors aregenerated for different semantics of the polysemy by using the plurality of neural networks, the defect that the polysemy only corresponds to one word vector in a traditional word-level embedded modeis overcome, and the size of the used corpus is within an acceptable range; meanwhile, a mode of combining a convolutional neural network (CNN) and a support vector machine (SVM) is adopted, on one hand, the feature extraction capability of the convolutional neural network is utilized, and on the other hand, the generalization and robustness of the SVM are utilized, so that the word meaning recognition effect is better, and the generated semantic word vector quality is higher.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Chinese entity relationship extraction method based on character and word feature fusion of entity meaning items

The invention relates to a Chinese entity relationship extraction method based on character and word feature fusion of entity meaning items. The method comprises the following steps of introducing entity meaning items to expand sentences into triples (sentences, entity 1 meaning items and entity 2 meaning items), enriching input fine grit and mapping three sequences in the triples into word vectormatrixes respectively; inputting statements in the triples into the two models in parallel, wherein one model is a two-way long and short-term memory network (Att-BLSTM) based on an attention mechanism to learn character features and the other model is one to learn partial features through a convolutional neural network (CNN) and learn word features through Att-BLSTM; respectively using Att-BLSTMto learn character-based entity 1 semantic item features and word-based entity 2 semantic item features and fusing four features into one feature that can fully characterize semantic information, which is used for relation extraction.According to the method, word segmentation errors can be avoided, the problem that one word has multiple meanings is solved, the Chinese entity relationship extraction accuracy is effectively improved, and the method can be widely applied to knowledge graph construction.
Owner:DONGHUA UNIV

Dynamic construction method for Web service component library and service search method thereof

The invention provides a dynamic construction method for a Web service component library, which realizes the search based on potential semantic matching by semantically marking a Web service description document. At the same time, the invention also provides a service search method designed according to the construction method. Because the comparison obtained by the construction method for the Webservice component library and the search method thereof is based on the semantic similarity of words and phrases, the problems such as polysemy, synonyms, singular and plural words, misspelling, etc.are solved to a certain extent. Thus, all the technical targets of the Web service search method are improved.
Owner:北京赛柏科技有限责任公司

Document-level sentiment classification method based on dynamic word vectors and hierarchical neural network

ActiveCN110765269ASentiment Classification Method OptimizationEnhance semantic expression abilitySemantic analysisNeural architecturesLinguistic modelDocument model
The invention discloses a document-level sentiment classification method based on dynamic word vectors and a hierarchical neural network. The method comprises the following steps: obtaining a high-quality dynamic word vector by constructing and training a bidirectional language model; and inputting the obtained dynamic word vector into a hierarchical neural network to model the document, thereby obtaining a vector representation containing rich semantic information, and inputting the vector into a softmax function to classify the document. According to the sentiment classification method, thehigh-quality dynamic word vector is generated by adopting the bidirectional language model, and the hierarchical neural network is provided for modeling the document, so that the problem of insufficient semantic expression of the static word vector to the polysemy is solved, and the document modeling capability in the sentiment classification task is further improved.
Owner:SOUTH CHINA UNIV OF TECH

Deep representation learning method based on feature controllable fusion

ActiveCN110866542AExcavate accuratelySolve the problem that the embedded representation is not rich enough to solve polysemyCharacter and pattern recognitionNeural architecturesAlgorithmEngineering
The invention discloses a deep representation learning method based on feature controllable fusion. On the basis of obtaining word contextualized embedded representations in a pre-trained multilayer language model, feature representations of different scales are obtained from local and sequence perspectives, and a multi-head interactive linear attention mechanism is proposed to extract context abstracts to realize context information representation of words. According to the deep representation learning method, the words are subjected to embedded representation by using the pre-trained multi-layer language model, so that more contextualized representation of the words is obtained, and the problems that word embedded representation is not rich enough and one word has multiple meanings in the conventional method are solved; a context abstract is provided, and the specific representation of the current word under the influence of the whole sentence is calculated by using multi-head interactive linear attention to discover the difference between the words so as to assist evaluation object mining; and finally, a gate mechanism is used for feature screening, and weights are allocated todifferent features, and the influence of useful features is enhanced.
Owner:XI AN JIAOTONG UNIV

Large-scale ontology mapping method for Chinese languages

The invention provides a mapping method for large-scale Chinese ontology. The method comprises the following steps: initializing a correlation degree computing method on the basis of the concept integrating Chinese thesaurus and an edit distance similarity algorithm; compressing large-scale ontology mapping scale on the basis of a pseudo-nuclear-force field potential function integrating concept similarity and dissimilarity improved by initial correlation degree; performing similarity measurement on complex concepts in the Chinese ontology through introducing a global sequence alignment algorithm. Chinese works have the phenomena of polysemy and sensitive word order, and the computing cost of large-scale ontology mapping is high, and according to the method, firstly, the existing pseudo-nuclear-force field potential function is improved, so that the measurement of similarity among concepts and the scale compression of the ontology to be mapped are more reasonable. Secondly, a global sequence alignment technology is adopted to map complex Chinese concepts, further defects of a traditional Chinese ontology mapping system are overcome, and finally the mapping efficiency of the system is improved, and the precision ratio and the recall ratio are increased.
Owner:CAPITAL UNIV OF ECONOMICS & BUSINESS

A subject-class-based cross-lingual biomedical research paper information recommendation method

The present invention relates to the technical field of information retrieval and recommendation systems, and more particularly, to a subject-class-based cross-lingual biomedical research paper information recommendation method. The method mainly comprises the following steps of: carrying out data preprocessing on the text data, applying the PLAS model to text clustering, calculating the word vector information of each subject grouping, obtaining the most relevant cross-language subject number of each subject, reading the retrieval word group input by the user, judging the retrieval word groupof the user, obtaining the recommendation result of the Chinese article and the recommendation of the English literature and so on. The invention realizes the dimensionality reduction of the analysisof the text from the word frequency space to the spatial subject space. The method of data dimension reduction can effectively reduce the dependence of the model on translation methods, which is conducive to cross-linguistic literature feature analysis. At the same time, topic model can effectively mine the semantic information in documents, discover the potential association between documents, and effectively solve the problem of polysemy and monosyllabic multi-word.
Owner:SUN YAT SEN UNIV

Word vector generation method based on Gaussian distribution

ActiveCN108733647AAvoid Point Estimation FeaturesFix the problem of assuming a fixed number of sensesSemantic analysisCharacter and pattern recognitionInclusion relationAlgorithm
The present invention discloses a word vector generation method based on the Gaussian distribution. The method comprises: firstly, preprocessing the corpus; secondly, using the punctuation to performtext division on the corpus; then combining the local and global information to infer the word meaning, and determining the mapping relationship between the word and the word meaning; and finally, obtaining a word vector by optimizing the objective function. The innovations and beneficial effects of the technical scheme of the present invention are as follows that: 1, words are represented based on the Gaussian distribution, point estimation characteristics of traditional word vectors are avoided, and more abundant information such as probabilistic quality, meaning connotation, an inclusion relationship, and the like can be brought to the word vectors; 2, multiple Gaussian distributions are used to represent the words, so that the linguistic characteristics of a word in the natural language can be coopered with; and 3, the similarity between the Gaussian distributions is defined based on the Hellinger distance, and by combining parameter updating and word meaning discrimination, the number of word meanings can be inferred adaptively, and the problem that the number of hypothetical word meanings of the model in the prior art is fixed is solved.
Owner:SUN YAT SEN UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products