Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

980 results about "Chinese word" patented technology

Chinese text classification method based on super-deep convolution neural network structure model

The invention provides a Chinese text classification method based on a super-deep convolution neural network structure model. The method comprises the steps of collecting a training corpus of a word vector from the internet, combining a Chinese word segmentation algorithm to conduct word segmentation on the training corpus, and obtaining a word vector model; collecting news of multiple Chinese news websites from the internet, and marking the category of the news as a corpus set for text classification, wherein the corpus set is divided into a training set corpus and a test set corpus; conducting word segmentation on the training set corpus and the test set corpus respectively, and then obtaining the word vectors corresponding to the training set corpus and the test set corpus respectively by utilizing the word vector model; establishing the super-deep convolution neural network structure model; inputting the word vector corresponding to the training set corpus into the super-deep convolution neural network structure model, and conducting training and obtaining a text classification model; inputting the Chinese text which needs to be sorted into the word vector model, obtaining the word vector of the Chinese text which needs to be classified, and then inputting the word vector into the text classification model to complete the Chinese text classification.
Owner:HEBEI UNIV OF TECH

Dependency semantic-based Chinese unsupervised open entity relationship extraction method

The invention relates to a dependency semantic-based Chinese unsupervised open entity relationship extraction method. The method comprises the following steps of preprocessing an input text: performing Chinese word segmentation, part-of-speech tagging and dependency grammar analysis on the input text; performing named entity identification on the input text; arbitrarily selecting two entities from identified entities to form candidate entity pairs; searching for a dependency path between two entities in the candidate entity pairs; and analyzing whether a syntactic structure mapped by the dependency path is matched with a normal form of a dependency semantic normal form set or not, if yes, extracting words or phrases from the residual part of the input text according to the matched normal form to serve as relational words, forming a relational triple by the extracted relational words and the candidate entity pairs, and if not, performing normal form matching of a next group of the candidate entity pairs; and outputting the relational triple. Compared with the prior art, the method has the advantages that the calculation complexity is low; the extraction efficiency is high; distance position limitation is overcome; a simple sentence also can be extracted and the like.
Owner:TONGJI UNIV

Industry comment data fine grain sentiment analysis method

The invention relates to an industry comment data fine grain sentiment analysis method. The industry comment data fine grain sentiment analysis method is applied to Internet data analysis and comprises obtaining comment data of e-commerce industry goods and preprocessing the comment data; establishing initial industry sentiment word libraries and computing distribution of words under different sentiment polarities through 1-gram and 2-gram; performing Chinese word segmentation on the comment data; based on the sentiment word libraries established through the 1-gram and the 2-gram, utilizing combined sentiment models to perform word modeling to obtain the probability distribution of the words which belong to different topics under different sentiment distributions; utilizing context information to re-determine the sentiment alignment of sentiment words in sentences; performing named entity identification and extracting comment characteristics through conditional random fields to compute the sentiment alignment of comment words of the comment characteristics. The industry comment data fine grain sentiment analysis method computes the sentiment of the comment words through the two dimensions of topic and sentiment to achieve fine grain sentiment analysis on the industry comment data, thereby achieving high precision and interpretability of analysis results.
Owner:中科嘉速(北京)信息技术有限公司

Chinese network review emotion classification method based on integrated study frame

The invention discloses a Chinese network review emotion classification method based on an integrated study frame. According to the method, a part-of-speech combination mode, an order-preserving sub-matrix mode and a frequent word sequence mode are adopted as input characteristics, in the level of characteristics, factors of the influence of Chinese word order information, interval phrase characteristics and the sentence length are considered, and the characteristic vector sparsity problem is solved through semantic similarities; the problem that many review text characteristics exist is solved, the inter-base-classifier independence is guaranteed, and the classification performance of base classifiers is improved as much as possible; a base classifier algorithm constructed based on product attributes is adopted to comprehensively review emotion information of each attribute in a text, and then the sentence-level emotional tendency of reviews is judged, so that a final classification result is more accurate. The Chinese network review emotion classification method based on the integrated study frame is applicable to e-commerce network review emotion classification in various fields, can make a potential consumer know evaluation information of a commodity before purchase and can also make a merchant better sufficiently know the consumer's opinion, and therefore the service quality is improved.
Owner:NANJING SILICON INTELLIGENCE TECH CO LTD

Method and system for extracting Chinese event

The invention provides a method and a system for extracting a Chinese event. The method comprises the following steps of: performing phrasing, word-splitting, entity identification and analysis for syntax and dependence relationship on a text with a to-be-extracted event in turn; marking the words meeting an extracting condition as candidate triggering words, according to internal structures of the words; filtering the triggering words meeting a filtering condition according to the probability, the word class and the internal structures of the words; extracting the triggering words by utilizing the maximum entropy identifying model and obtaining the reliability of each of the triggering words; dividing the triggering words into a consistency processing training set and a consistency processing testing set according to the reliability of each of the triggering words; utilizing a maximum entropy classifier to extract the triggering words from the consistency processing testing set; and utilizing a maximum entropy classifying model to classify the triggering words, thereby obtaining an event set. According to the method and the system provided by the invention, started from the characteristics of Chinese, the internal structures of Chinese words and the semantic consistency of the Chinese words in sections and chapters are comprehensively considered and analyzed, so that the property of extracting the Chinese event is increased.
Owner:平江县鑫晟信息科技有限公司

Construction and utilization method for context-aware dynamic word or character vector on the basis of deep learning

The invention belongs to the technical field of the natural language processing of computers, in particular to a construction and utilization method for a context-aware dynamic word or character vector on the basis of deep learning. The dynamic construction method for the context-aware dynamic word or character vector on the basis of the deep learning comprises the following steps of: in massive texts, through an unsupervised learning method, simultaneously learning a global feature vector of a word or character and the feature vector representation of the global feature vector when a specific context appears, and combining the global feature vector with the context feature vector, and dynamically generating word or character vector representation. By use of the method, the word or character vector dynamically constructed on the basis of the context can be applied to a natural language processing system. The method is mainly used for solving a problem that the word or character vector expresses different meanings in different contexts, i.e. the problem that one word or one character has multiple meanings can be solved. The dynamic word or character vector can be used for obviously improving the performance of various natural language processing tasks of different languages, wherein the tasks comprise Chinese word segmentation, part-of-speech tagging, naming recognition, grammatical analysis, semantic role tagging, sentiment analysis, text classification, machine translation and the like.
Owner:FUDAN UNIV

Naive Bayesian classification based mobile phone spam short message filtering method and system

The invention provides a Naive Bayesian classification based mobile phone spam short message filtering method and system. The system comprises a message intercepting module, a cache, a blacklist filtering module, a keyword filtering module and an intelligent Naive Bayesian classification filtering module. The message intercepting module is used for intercepting newly received short messages; the blacklist filtering module is used for filtering the new short messages according to a preset blacklist; the keyword filtering module is used for filtering the new short messages on the basis of preset keyword pairs; the intelligent Naive Bayesian classification filtering module is used for calculating probability that whether the new short messages are spam short messages or not by adopting a Naive Bayesian algorithm on the basis of a pre-trained feature word bank, and judging the new short messages as the spam short messages if the probability ratio exceeds a preset threshold, and as normal short messages otherwise. By the Naive Bayesian classification based mobile phone spam short message filtering method and system, through combination of the blacklist, the keywords, Naive Bayesian classification technology and Chinese word segmentation technology, the short messages are judged whether to be the spam short messages or not intelligently, so that the spam short messages are filtered.
Owner:青岛腾信汽车网络科技服务有限公司

Self-adaptive Chinese word segmentation method based on embedded representation

The embodiment of the invention discloses a self-adaptive Chinese word segmentation method based on embedded representation and belongs to the field of information processing. The method is characterized in that an embedded representation layer of a character is shared by a word segmentation network and a character language model. As for embedded representation of the character, on the one hand, hidden multi-granularity local features of a to-be-segmented text is obtained by means of the word segmentation network based on convolutional neural network; then label probability of the character is obtained through a forward network layer; finally, label inference is used to obtain the optimum segmentation result in the sentence level; on the other hand, an unlabelled text is randomly extracted, a character next to the character is predicted by means of a character language model based on a long- and short-term memory unit (LSTM) recurrent neural network and the word segmentation network is constrained. By modeling a character co-representing relationship in texts in different fields by means of the character language model and transferring information to the word segmentation network by means of embedded representation, the field transfer ability of word segmentation is enhanced, and the method has very huge practical value.
Owner:BEIJING UNIV OF POSTS & TELECOMM

Dictionary-based lucene Chinese word segmentation method

The invention discloses a dictionary-based Chinese word segmentation method. The method comprises the steps of collecting linguistic data; establishing a terminological dictionary, wherein the establishing method comprises the steps of removing stop words firstly, dividing the linguistic data into text fragments, exacting candidate words from the text fragments, obtaining the appearance probability of the candidate words and each individual character in all the text fragments through statistics, calculating the mutual information of two Chinese characters in each candidate word, keeping the candidate words if mutual information is larger than a preset mutual information threshold value, deleting the candidate words otherwise, combining the candidate words obtained after screening, matching and filtering the combined candidate words by means of a general dictionary, and adding the candidate words obtained after filtration into the terminological dictionary; conducting word segmentation on a text with words to be segmented by means of the terminological dictionary firstly, and then conducting word segmentation on the rest of texts by means of the general dictionary. The terminological dictionary is established by extracting terminologies from the linguistic data through statistics, universality is high, and requirements of the professional field can be effectively met by conducting word segmentation with the terminological dictionary.
Owner:成都天府云数信息技术有限公司

Microblog-based neologism emotional tendency judgment method

The invention relates to a microblog-based neologism emotional tendency judgment method, belonging to the field of natural language processing. The microblog-based neologism emotional tendency judgment method disclosed by the invention comprises the following steps: dividing words of microblog corpuses through a Chinese word division tool, blocking the corpuses, the words in which are divided, by taking stop words in a word division result as a division point, pairwise combining adjacent word strings in each block, calculating the combined word string frequency, and taking the word strings, the frequencies of which are higher than a threshold value, as neologism candidate strings; filtering the neologism candidate strings according to a word formation rule of Chinese linguistics and an adjacent change number rule so as to obtain neologisms; calculating the similarity between co-occurrence words and hownet emotional words by utilizing an emotional dictionary of a hownet; calculating the relevancy between the neologisms and the co-occurrence words; constructing an image model; and obtaining the emotional polarity distribution of the neologisms by utilizing a label propagation algorithm, and obtaining the emotional tendency of the neologisms by constructing a linear classifier. By means of judgement of the emotional tendency of the neologisms, a blogger can express views better; and furthermore, the emotional tendency of the blogger can be accurately known by users.
Owner:KUNMING UNIV OF SCI & TECH

Chinese text verification system and method based on Chinese vague pronunciation and voice recognition

The invention discloses a Chinese text verification system and method based on Chinese vague pronunciation and voice recognition. The system comprises a voice collecting and processing module, a voice recognition module and a text verifying and sharing module, wherein the voice collecting and processing module is used for collecting an audio and compressing and denoising the audio, the voice recognition module is used for recognizing voices into a text, and the text verifying and sharing module is used for achieving text verification and meanwhile supports text editing and sharing. The method comprises the steps that Chinese error judgment rules based on parts of speech are defined; word segmentation is carried out on the Chinese text obtained after voice recognition; the segmented words are scanned according to the Chinese error judgment rules to find out wrong Chinese words; a vogue pronunciation table is defined based on Chinese vogue pronunciation rules; all vogue pinyin of the wrong words is found out through a Cartesian product mode; a dictionary table is inquired to obtain a candidate word set of all the vogue pinyin; a candidate error correction set is selected out of words of the candidate set of all the vogue pinyin according to a word frequency rank. By means of the Chinese text verification system and method based on Chinese vague pronunciation and voice recognition, Chinese errors, caused by Chinese vogue pronunciation, in voice recognition are eliminated, and the error correction accuracy rate of a verification algorithm is effectively increased.
Owner:HOHAI UNIV

Cryptogram-based safe full-text indexing and retrieval system

The invention discloses a cryptogram-based safe full-text indexing and retrieval system. In the system, a cryptogram index library comprises a cryptogram entry reverse index and an internal document object set; a cryptogram document library is responsible for storing and managing an encrypted XML document; a word segmentation encryption server carries out Chinese word segmentation on a plaintext document and encrypts the plaintext document item by item; a cryptogram full-text indexing server standardizes an original plaintext document into an XML document, encrypts and stores the XML document in the cryptogram document library, creates a corresponding internal document object in the cryptogram index library by combining document metamessage, and creates a cryptogram reverse index for the XML document through the cryptogram entry; and a cryptogram full-text retrieval server retrieves the cryptogram index library to obtain the internal document object set through user authority information and the cryptogram entry, obtains a corresponding encrypted XML document result set from the cryptogram document library according to a pointer, decrypts the corresponding encrypted XML document result set, and returns the decrypted corresponding encrypted XML document result set to a user. The Chinese word segmentation method, the safe and high-efficiency indexing structure and the retrieval mechanism of the invention based on the special requirements of cryptogram full-text indexing can realize the cryptogram full-text indexing integrated with an access control strategy. The cryptogram-based safe full-text indexing and retrieval system has the advantages of a safe and high-efficiency indexing process, no decrypted docuterms in the indexing process, a high recall ratio and a high precision ratio in a cryptogram environment, and the like.
Owner:HUAZHONG UNIV OF SCI & TECH
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products