Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

188 results about "Lexical item" patented technology

In lexicography, a lexical item (or lexical unit / LU, lexical entry) is a single word, a part of a word, or a chain of words (catena) that forms the basic elements of a language's lexicon (≈ vocabulary). Examples are cat, traffic light, take care of, by the way, and it's raining cats and dogs. Lexical items can be generally understood to convey a single meaning, much as a lexeme, but are not limited to single words. Lexical items are like semes in that they are "natural units" translating between languages, or in learning a new language. In this last sense, it is sometimes said that language consists of grammaticalized lexis, and not lexicalized grammar. The entire store of lexical items in a language is called its lexis.

Domain-knowledge-based short text classification method and text classification system

The invention discloses a domain-knowledge-based short text classification method and a domain-knowledge-based short text classification system used in the technical field of information. The method is used for overcoming the defect that the traditional text classification method cannot well classify short texts. Aiming at the characteristics that the short text description concept signals are relatively weak and the text features are seriously insufficient, the invention provides the short text data classification method and the text classification system suitable for commodity web page data. According to the embodiment, a commodity classifier with excellent classification effect is obtained by reforming the traditional classifier, introducing new elements and devoting to matching application of algorithm and data. The introduction of the new elements comprises the following steps of: introducing a concept of domain words and introducing the concept into the classifier so as to effectively increase the information quantity of the short texts; and performing different-lexical-item-set-based semantic analysis on the short text data, particularly the web page commodity data, and introducing the semantic analysis result into the classifier so as to introduce new information for the commodity data information and improve the accuracy of text classification.
Owner:SHANGHAI BIJIA DATA

LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method

ActiveCN103823848AFast and efficient similar recommendationRobustSpecial data processing applicationsLexical itemVector space model
The invention discloses an LDA (latent dirichlet allocation) and VSM (vector space model) based similar Chinese herb literature recommendation method. The method includes: adopting an IKAnalyzer to perform word segmentation on topics and summary information of literature on the basis of a terminological dictionary for Chinese herbs, constructing a vector space, performing dimensionality reduction on the vector space, constructing a semantic dictionary, numbering all lexical items in the dictionary in sequence, performing vectorization through each document on the basis of the semantic dictionary, constructing term vectors of each document, utilizing LDA and a Gibbs sampling algorithm to perform training to obtain probability distribution of each document on themes, then computing a value of similarity between every two documents by the aid of KL divergence, computing cosine similarity of the term vectors of each document on the basis of term frequency, performing joint weighting on the two kinds of similarities prior to performing similarity sorting, and then making recommendation. By the method, the literature, similar both in content and theme, in the Chinese herb literature can be recommended to users, and recommendation results are closer to user requirements.
Owner:ZHEJIANG UNIV

Method and system for advertisement recommendation based microblog

The invention belongs to the field of data mining and provides a method and system for advertisement recommendation based a microblog. The method comprises the steps that microblog data are read; the microblog data are initialized and a microblog text lexical item set is obtained; stop words of the microblog text lexical item set are deleted and a microblog text original feature lexical item set is obtained; mapping is conducted on the microblog text original feature lexical item set and a feature lexical item dictionary, whether lexical items in the microblog text original feature lexical item set exist in the feature lexical item dictionary or not is judged, and the tf-idf values of the appearing lexical items are calculated and serve as the feature values of the lexical items; whether the lexical items of the feature lexical item dictionary exist in the microblog text original feature lexical item set or not is judged and the feature values of the lexical items which do not appear are marked to be zero; feature vectors of the feature values obtained through calculation are automatically classified to classifications divided in advance; according to an automatic classification result, advertisements are recommended to a user. The advertisements recommended by the method and system are accurate and the effect is good.
Owner:SHENZHEN INST OF ADVANCED TECH CHINESE ACAD OF SCI

Software defect positioning method based on text part of speech and program call relation

ActiveCN105159822AImprove defect localization accuracySoftware testing/debuggingPart of speechSource code file
The invention discloses a software defect positioning method based on text part of speech and a program call relation. The method comprises: (1) extracting text messages of summaries and descriptions in a defect report, and increasing weights of noun lexical items and weights of all lexical items of a summary module in the defect report according to part of speech tags; (2) filtering out elements not required by a source code file according to a demand parameter ran of a developer, and preprocessing the text messages of the defect report and the filtered source code file; (3) generating a suspicious defect source code file list; (4) finding out a called source file through character string retrieval, and increasing a similarity value to correct an original rank; and (5) outputting a defect source code file or a defect source code file list corresponding to the defect report according to the demand parameter ran of the developer. According to the software defect positioning method based on the text part of speech and the program call relation, the text part of speech is utilized to adjust the weights of the lexical items, the program call relation is utilized to correct the similarity value, and the source code file is filtered and a final result is output according to the demand of a programmer, so that the purpose of improving the accuracy of defect positioning is achieved.
Owner:NANJING UNIV OF AERONAUTICS & ASTRONAUTICS

Pseudo-correlation feedback model information retrieval method and system based on semantic similarity

The invention provides a pseudo-correlation feedback model information retrieval method and system based on semantic similarity. The method comprises the following steps: carrying out a first query from a target document set according to a query keyword to extract a pseudo-related document set, carrying out query expansion by adopting a Rochio algorithm, carrying out query expansion according to the semantic similarity of sentences, fusing the results of the two query expansion methods, and carrying out a second query to realize final information retrieval. According to the invention, when theextended lexical item is selected; the importance degree relationship between the query lexical item and the extension word in the traditional method can be highlighted; the semantic correlation of the sentences where the lexical items are located is combined; the condition that lexical items are associated when sentence semantics are similar in reality is met; According to the method and the device, the conditions that the semantics are related even if the lexical items are different are represented, so that the query words have better regional indexing in a multi-semantic environment, a large amount of useless and irrelevant information can be removed from mass information, more accurate candidate words can be obtained, and the precision of expanded query and final retrieval can be improved.
Owner:HUAZHONG NORMAL UNIV

Financial public opinion perception method based on weighted LDA (latent Dirichlet allocation) topic model

The invention discloses a financial public opinion perception method based on a weighted LDA (latent Dirichlet allocation) topic model and belongs to the technical field of natural language understanding and processing as well as network public opinion. Everyday financial public opinions are perceived on the basis of microblog data related to everyday finance and are quantified according to 'everyday financial public opinion comprehensive index'. The 'everyday financial public opinion comprehensive index' is a weighted average of all financial related blog emotion values on the day, and the blog emotion values are a result of text emotion classification of blog content. An SVM (support vector machine) classification model based on weighted LDA is adopted for text emotion classification and adopts the weighted LDA for establishing text represented hidden topic space, objective data indirectly embodying investor sentiment and subjective data directly embodying investor sentiment are organically combined with a new lexical item weight calculation method, and accurate understanding of texts from the semantic level is promoted greatly, so that the text emotion classification effect is better.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products