Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

2248 results about "Text categorization" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Text categorization (a.k.a. text classification) is the task of assigning predefined categories to free-text documents. It can provide conceptual views of document collections and has important applications in the real world.

Methods and apparatus for classifying text and for building a text classifier

InactiveUS6192360B1Quick calculationData processing applicationsDigital data information retrievalPattern recognitionText categorization

A text classifier and building the text classifier by determining appropriate parameters for the text classifier.

Methods and apparatus for classifying text and for building a text classifier

Methods and apparatus for classifying text and for building a text classifier

Methods and apparatus for classifying text and for building a text classifier

Owner:MICROSOFT TECH LICENSING LLC

Robust information extraction from utterances

ActiveUS8583416B2Increase cross-entropyHigh precisionSpeech recognitionSpecial data processing applicationsFeature extractionText categorization

The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.

Robust information extraction from utterances

Robust information extraction from utterances

Robust information extraction from utterances

Owner:NANT HLDG IP LLC

Category based, extensible and interactive system for document retrieval

InactiveUS20050108200A1High precisionReduction in relevant document recall rateWireless commuication servicesData switching networksDocument analysisPaper document

In information retrieval (IR) systems with high-speed access, especially to search engines applied to the Internet and / or corporate intranet domains for retrieving accessible documents automatic text categorization techniques are used to support the presentation of search query results within high-speed network environments. An integrated, automatic and open information retrieval system (100) comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requester, said system (100) retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requester, and the requester designates the relevant topics. The requester is then granted access only to documents assigned to relevant topics. A knowledge database (1408) linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.

Category based, extensible and interactive system for document retrieval

Category based, extensible and interactive system for document retrieval

Category based, extensible and interactive system for document retrieval

Owner:COGISUM INTERMEDIA

System and method for sentiment-based text classification and relevancy ranking

ActiveUS8166032B2Digital data information retrievalDigital data processing detailsSentiment scoreText categorization

The sentimental significance of a group of historical documents related to a topic is assessed with respect to change in an extrinsic metric for the topic. A unique sentiment binding label is included to the content of actions documents that are determined to have sentimental significance and the group of documents is inserted into a historical document sentiment vector space for the topic. Action areas in the vector space are defined from the locations of action documents and singular sentiment vector may be created that describes the cumulative action area. Newly published documents are sentiment-scored by semantically comparing them to documents in the space and / or to the singular sentiment vector. The sentiment scores for the newly published documents are supplemented by human sentiment assessment of the documents and a sentiment time decay factor is applied to the supplemented sentiment score of each newly published documents. User queries are received and a set of sentiment-ranked documents is returned with the highest age-adjusted sentiment scores.

System and method for sentiment-based text classification and relevancy ranking

System and method for sentiment-based text classification and relevancy ranking

System and method for sentiment-based text classification and relevancy ranking

Owner:MARKETCHORUS

Method and system for extracting and classifying geolocation information utilizing electronic social media

InactiveUS20130086072A1DistanceReduce “ noisy ” dataDigital data information retrievalDigital data processing detailsData ingestionGeotargeting

Methods, systems and processor-readable media for extracting and classifying location information utilizing social media messages and / or data thereof. The social media messages can be sampled from a social media database and the messages filtered based on a heuristic rule. A geolocation entity from the unstructured social media messages can be extracted utilizing a geolocation entity extracting module. The messages with the geoentities can be uploaded onto a crowd sourcing platform to manually annotate the messages with a label. A text classification model can be built and learned from the label utilizing a machine learning algorithm and the messages can be classified by a location classifier in order to extract the user location. The user location can then be transformed into a geocode so that a spatial search can be enabled and the distance between the locations can be easily calculated.

Method and system for extracting and classifying geolocation information utilizing electronic social media

Method and system for extracting and classifying geolocation information utilizing electronic social media

Method and system for extracting and classifying geolocation information utilizing electronic social media

Owner:XEROX CORP

Document categorisation system

InactiveUS7971150B2Digital data information retrievalIndoor gamesElectronic documentDocumentation procedure

A document categorization system, including a clusterer for generating clusters of related electronic documents based on features extracted from the documents, and a filter module for generating a filter on the basis of the clusters to categorize further documents received by the system. The system may include an editor for manually browsing and modifying the clusters. The categorization of the documents is based on n-grams, which are used to determine significant features of the documents. The system includes a trend analyzer for determining trends of changing document categories over time, and for identifying novel clusters. The system may be implemented as a plug-in module for a spreadsheet application for permitting one-off or ongoing analysis of text entries in a worksheet.

Document categorisation system

Document categorisation system

Document categorisation system

Owner:TELSTRA CORPORATION LIMITD

Robust Information Extraction from Utterances

ActiveUS20090171662A1Increase cross-entropyImprove precisionSpeech recognitionSpecial data processing applicationsFeature extractionText categorization

The performance of traditional speech recognition systems (as applied to information extraction or translation) decreases significantly with, larger domain size, scarce training data as well as under noisy environmental conditions. This invention mitigates these problems through the introduction of a novel predictive feature extraction method which combines linguistic and statistical information for representation of information embedded in a noisy source language. The predictive features are combined with text classifiers to map the noisy text to one of the semantically or functionally similar groups. The features used by the classifier can be syntactic, semantic, and statistical.

Robust Information Extraction from Utterances

Robust Information Extraction from Utterances

Robust Information Extraction from Utterances

Owner:NANT HLDG IP LLC

System and method for sentiment-based text classification and relevancy ranking

ActiveUS20100262454A1Digital data information retrievalDigital data processing detailsEmotion assessmentSentiment score

The sentimental significance of a group of historical documents related to a topic is assessed with respect to change in an extrinsic metric for the topic. A unique sentiment binding label is included to the content of actions documents that are determined to have sentimental significance and the group of documents is inserted into a historical document sentiment vector space for the topic. Action areas in the vector space are defined from the locations of action documents and singular sentiment vector may be created that describes the cumulative action area. Newly published documents are sentiment-scored by semantically comparing them to documents in the space and / or to the singular sentiment vector. The sentiment scores for the newly published documents are supplemented by human sentiment assessment of the documents and a sentiment time decay factor is applied to the supplemented sentiment score of each newly published documents. User queries are received and a set of sentiment-ranked documents is returned with the highest age-adjusted sentiment scores.

System and method for sentiment-based text classification and relevancy ranking

System and method for sentiment-based text classification and relevancy ranking

System and method for sentiment-based text classification and relevancy ranking

Owner:MARKETCHORUS

Method for improvement accuracy of decision tree based text categorization

InactiveUS6253169B1Improve accuracyDigital data information retrievalSpecial data processing applicationsElectronic documentAlgorithm

A text categorization method automatically classifies electronic documents by developing a single pooled dictionary of words for a sample set of documents, and then generating a decision tree model, based on the pooled dictionary, for classifying new documents. Adaptive resampling techniques are applied to improve the accuracy of the decision tree model.

Method for improvement accuracy of decision tree based text categorization

Method for improvement accuracy of decision tree based text categorization

Method for improvement accuracy of decision tree based text categorization

Owner:NUANCE COMM INC

System and method for document categorization

InactiveUS7496567B1Data processing applicationsDigital data information retrievalText categorizationSubject matter

The present invention provides methods and systems for automatic categorization of documents. More specifically, the present invention provides for the automatic assignment of a set of pre-defined topics to a set of documents.

System and method for document categorization

System and method for document categorization

System and method for document categorization

Owner:STEICHEN TERRIL JOHN

Text categorization toolkit

InactiveUS6212532B1Data processing applicationsDigital data information retrievalFeature extractionText categorization

A module information extraction system capable of extracting information from natural language documents. The system includes a plurality of interchangeable modules including a data preparation module for preparing a first set of raw data having class labels to be tested, the data preparation module being selected from a first type of the interchangeable modules. The system further includes a feature extraction module for extracting features from the raw data received from the data preparation module and storing the features in a vector format, the feature extraction module being selected from a second type of the interchangeable modules. A core classification module is also provided for applying a learning algorithm to the stored vector format and producing therefrom a resulting classifier, the core classification module being selected from a third type of the interchangeable modules. A testing module compares the resulting classifier to a set of preassigned classes, where the testing module is selected from a fourth type of the interchangeable modules, where the testing module tests a second set of raw data having class labels received by the data preparation module to determine the degree to which the class labels of the second set of raw data approximately corresponds to the resulting classifier.

Text categorization toolkit

Text categorization toolkit

Text categorization toolkit

Owner:IBM CORP

Automated topic discovery in documents and content categorization

ActiveUS9047283B1Easy to findEfficient and accurate and scalableWeb data indexingSemantic analysisSemantic propertyPart of speech

A computer-assisted method for discovering topics and categorizing contents in a document includes the steps of calculating an importance score for a term based on grammatical roles, parts of speech, and semantic attributes, selecting terms based on the importance score values of the respective terms, and outputting terms comprising the selected term to represent topics in the document, and building a category structure based on the selected terms.

Automated topic discovery in documents and content categorization

Automated topic discovery in documents and content categorization

Automated topic discovery in documents and content categorization

Owner:LINFO IP LLC

Method and system for filtering sensitive web page based on multiple classifier amalgamation

ActiveCN101281521ASolve tight control problemsShort processing timeCharacter and pattern recognitionData switching networksData streamInternet content

The invention discloses a system and a method for filtering sensitive webpage, which is based on multi-classifier fusion. The processing object is a webpage, and the processing result is whether the webpage contains sensitive content, which may be pornography, reaction, violence and other unhealthy Internet contents harmful to society. The system comprises a data stream obtaining and preprocessing unit, an image and text stream filtering unit and an information fusion unit of image filter and text filter, by the cooperation of multiple classifiers, the system acquires source code of a webpage by using the URL of the webpage, a text and an image are separated at preprocessing stage to obtain text information and effective image information; an input webpage is divided into three modes by decision tree algorithm; the webpage is recognized by using a consecutive text classifier, a discrete sensitive text classifier and an image classifier, the output result recognized by the classifiers is fused and calculated, then a judge factor is given, and the final result is returned to a browser.

Method and system for filtering sensitive web page based on multiple classifier amalgamation

Method and system for filtering sensitive web page based on multiple classifier amalgamation

Method and system for filtering sensitive web page based on multiple classifier amalgamation

Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

System and method for automatically classifying text

InactiveUS7028250B2Digital data information retrievalNatural language analysisText categorizationDegree of association

A method is provided for automatically classifying text into categories. In operation, a plurality of tokens or features are manually or automatically associated with each category. A weight is then coupled to each feature, wherein the weight indicates a degree of association between the feature and the category. Next, a document is parsed into a plurality of unique tokens with associated counts, wherein the counts are indicative of the number of times the feature appears in the document. A category score representative of a sum of products of each feature count in the document times the corresponding feature weight in the category for each document is then computed. Next, the category scores are sorted by perspective, and a document is classified into a particular category, provided the category score exceeds a predetermined threshold.

System and method for automatically classifying text

System and method for automatically classifying text

System and method for automatically classifying text

Owner:AVOLIN LLC

Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering

InactiveUS20050228783A1Improve abilitiesReduce marginDigital data information retrievalDigital data processing detailsGreek letter betaAlgorithm

An information need can be modeled by a binary classifier such as support vector machine (SVM). SVMs can exhibit very conservative precision oriented behavior when modeling information needs. This conservative behavior can be overcome by adjusting the position of the hyperplane, the geometric representation of a SVM. The present invention describes a couple of automatic techniques for adjusting the position of an SVM model based upon a beta-gamma thresholding procedure, cross fold validation and retrofitting. This adjustment technique can also be applied to other types of learning strategies.

Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering

Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering

Method and apparatus for adjusting the model threshold of a support vector machine for text classification and filtering

Owner:JUSTSYST EVANS RES

Academic resource recommendation service system and method

ActiveCN106815297AEfficient acquisitionRealize dynamic acquisitionSpecial data processing applicationsPersonalizationResource quality

The invention provides an academic resource recommendation service system and method. The method comprises the following steps: crawling academic resources on an internet by using an LDA (Latent Dirichlet Allocation)-based focused crawler, classifying the academic resources according to preset A types by using an LDA-based text classification model, and storing the academic resources in a local academic resource database, wherein the system further comprises an academic resource model, a resource quality value calculation module and a user interest module; implanting a tracking software module at a user terminal, combining interesting subjects and historical browsing behavior data of the user, respectively modeling the academic resource model and the user interest module by virtue of four dimensions such as the academic resource type, subject theme distribution, key word distribution and LDA latent theme distribution, calculating the similarity between the academic resource model and the user interest preference module, combining the resource quality value to calculate the recommendation degree, and finally perform academic resource Top-N recommendation for the user according to the recommendation degree. According to the method disclosed by the invention, personalized accurate recommendation of the academic resources is performed according to the identity, interest and browsing behaviors of users, and the working efficiency of scientific research personnel is improved.

Academic resource recommendation service system and method

Academic resource recommendation service system and method

Academic resource recommendation service system and method

Owner:NINGBO UNIV

Feature selection for two-class classification systems

ActiveUS20040059697A1Improve performanceDigital data processing detailsDigital computer detailsText categorizationFeature selection

A two-class analysis system for summarizing features and determining features appropriate to use in training a classifier related to a data mining operation. Exemplary embodiments describe how to select features which will be suited to training a classifier used for a two-class text classification problem. Bi-Normal Separation methods are defined wherein there is a measure of inverse cumulative distribution function of a standard probability distribution and representative of a difference between occurrences of the feature between said each class. In addition to training a classifier, the system provides a means of summarizing differences between classes.

Feature selection for two-class classification systems

Feature selection for two-class classification systems

Feature selection for two-class classification systems

Owner:MICRO FOCUS LLC

Programming guide content collection and recommendation system for viewing on a portable device

InactiveUS20060123448A1Television system detailsColor television detailsText categorizationUser profile

An EPG contents collection and recommendation system includes an EPG database of identifications of available programs. A program information acquisition module applies text classification to detailed descriptions of the available programs. An EPG recommendation module recommends an available program to a user based on the text classification. Preferably, EPG contents are collected from publicly available TV websites and parsed into a uniform format. For example, contents are vectorized, and a Maximum Entropy technique is applied. Also, user interaction with the EPG database is used to form a user profile database. Further, classifiers are trained based on contents of the user profile database, and these classifiers are used to recommend EPG contents to the user.

Programming guide content collection and recommendation system for viewing on a portable device

Programming guide content collection and recommendation system for viewing on a portable device

Programming guide content collection and recommendation system for viewing on a portable device

Owner:SOVEREIGN PEAK VENTURES LLC

Method for efficiently building compact models for large multi-class text classification

InactiveUS20090274376A1Semantic analysisCharacter and pattern recognitionText categorizationData mining

A method of classifying documents includes: specifying multiple documents and classes, wherein each document includes a plurality of features and each document corresponds to one of the classes; determining reduced document vectors for the classes from the documents, wherein the reduced document vectors include features that satisfy threshold conditions corresponding to the classes; determining reduced weight vectors for relating the documents to the classes by comparing combinations of the reduced weight vectors and the reduced document vectors and separating the corresponding classes; and saving one or more values for the reduced weight vectors and the classes. Specific embodiments are directed to formulations for determining the reduced weight vectors including one-versus-rest classifiers, maximum entropy classifiers, and direct multiclass Support Vector Machines.

Method for efficiently building compact models for large multi-class text classification

Method for efficiently building compact models for large multi-class text classification

Method for efficiently building compact models for large multi-class text classification

Owner:OATH INC

Domain-knowledge-based short text classification method and text classification system

InactiveCN102194013AIncrease the amount of informationSpecial data processing applicationsData informationText categorization

The invention discloses a domain-knowledge-based short text classification method and a domain-knowledge-based short text classification system used in the technical field of information. The method is used for overcoming the defect that the traditional text classification method cannot well classify short texts. Aiming at the characteristics that the short text description concept signals are relatively weak and the text features are seriously insufficient, the invention provides the short text data classification method and the text classification system suitable for commodity web page data. According to the embodiment, a commodity classifier with excellent classification effect is obtained by reforming the traditional classifier, introducing new elements and devoting to matching application of algorithm and data. The introduction of the new elements comprises the following steps of: introducing a concept of domain words and introducing the concept into the classifier so as to effectively increase the information quantity of the short texts; and performing different-lexical-item-set-based semantic analysis on the short text data, particularly the web page commodity data, and introducing the semantic analysis result into the classifier so as to introduce new information for the commodity data information and improve the accuracy of text classification.

Domain-knowledge-based short text classification method and text classification system

Domain-knowledge-based short text classification method and text classification system

Domain-knowledge-based short text classification method and text classification system

Owner:SHANGHAI BIJIA DATA

Text categorization feature selection and weight computation method based on field knowledge

InactiveCN101290626AImprove classification effectAccuracy Ratio ImprovementComputing modelsSpecial data processing applicationsFeature vectorText categorization

The invention relates to the artificial intelligence technical field, in particular to a text classification feature selection and weigh calculation method based on field knowledge. The method combines sample statistics and field glossaries to construct a filed classification feature space, utilizes internal knowledge relations in the field, calculates the similarity between the glossaries, and then adjusts the corresponding feature weight of classification feature vectors. Moreover, the method adopts a learning algorithm of a support vector machine to construct a field text classification model and then realize field text classification. As shown by text classification laboratory results of the Yunan tourist field and the non-tourist field, the classification accuracy of the method is improved by 4 percent compared with the text classification effect of the improved TFIDF feature weigh method.

Text categorization feature selection and weight computation method based on field knowledge

Text categorization feature selection and weight computation method based on field knowledge

Text categorization feature selection and weight computation method based on field knowledge

Owner:KUNMING UNIV OF SCI & TECH

Chinese text classification method based on super-deep convolution neural network structure model

InactiveCN107301246AImprove learning effectAdd depthNatural language data processingSpecial data processing applicationsText categorizationClassification methods

The invention provides a Chinese text classification method based on a super-deep convolution neural network structure model. The method comprises the steps of collecting a training corpus of a word vector from the internet, combining a Chinese word segmentation algorithm to conduct word segmentation on the training corpus, and obtaining a word vector model; collecting news of multiple Chinese news websites from the internet, and marking the category of the news as a corpus set for text classification, wherein the corpus set is divided into a training set corpus and a test set corpus; conducting word segmentation on the training set corpus and the test set corpus respectively, and then obtaining the word vectors corresponding to the training set corpus and the test set corpus respectively by utilizing the word vector model; establishing the super-deep convolution neural network structure model; inputting the word vector corresponding to the training set corpus into the super-deep convolution neural network structure model, and conducting training and obtaining a text classification model; inputting the Chinese text which needs to be sorted into the word vector model, obtaining the word vector of the Chinese text which needs to be classified, and then inputting the word vector into the text classification model to complete the Chinese text classification.

Chinese text classification method based on super-deep convolution neural network structure model

Chinese text classification method based on super-deep convolution neural network structure model

Chinese text classification method based on super-deep convolution neural network structure model

Owner:HEBEI UNIV OF TECH

Abnormal information text classification method based on knowledge graph

InactiveCN108595708AImprove reliabilitySpecial data processing applicationsEntity linkingGraph spectra

The invention provides an abnormal information text classification method based on a knowledge graph. According to the method, first, a domain knowledge graph is constructed, and an entity identifierand an entity link based on the domain knowledge graph are constructed; second, text feature representation vectors v<text> and entity feature representation vectors v<ent> are constructed; and last,the text feature representation vectors and the entity feature representation vectors are merged to obtain new text representation vectors v<merge> fusing knowledge features, classified training is performed on the new text representation vectors, and a final classification result is obtained.

Abnormal information text classification method based on knowledge graph

Abnormal information text classification method based on knowledge graph

Abnormal information text classification method based on knowledge graph

Owner:BEIHANG UNIV +1

Text label extracting method and device

ActiveCN106156204AMeet different granularity retrieval requirementsSpecial data processing applicationsText database clustering/classificationGranularityText categorization

The invention relates to a text label extracting method. The text label extracting method comprises the following steps: category prediction is performed on a to-be-extracted text through a text categorization model, and a target category of the text is obtained; topic prediction is performed on the to-be-extracted text through a topic clustering model, and a predicted topic is obtained; if the predicted topic is in a default topic set, a target topic corresponding to the predicted topic is acquired, keyword extraction is performed on the to-be-extracted text, target keywords of the text are obtained, and the target category, the target topic and the target keywords are taken as labels of the text. The text labels have different levels to meet multi-granularity retrieval requirements, and multi-granularity recommended articles can be provided according to different labels. Besides, the invention provides a text label extracting device.

Text label extracting method and device

Text label extracting method and device

Text label extracting method and device

Owner:SHENZHEN TENCENT COMP SYST CO LTD

Video classification method and device and server

ActiveCN109359636AFully consider the characteristics of different dimensionsImprove accuracySemantic analysisVideo data clustering/classificationText categorizationClassification methods

The invention discloses a video classification method and device and a server. The method comprises the following steps of: obtaining a target video; The image frames in the target video are classified by the first classification model, and the image classification result is obtained. The first classification model is used for classification based on the image features of the image frames. The audio in the target video is classified by the second classification model, and the audio classification result is obtained. The second classification model is used to classify the audio based on the audio features. The text description information corresponding to the target video is classified by the third classification model, and the text classification result is obtained. The third classification model is used to classify the text information based on the text characteristics of the text description information. According to the image classification results, audio classification results andtext classification results, the target video target classification results are determined. In the present application, image features, audio features and text features are integrated for classification, and features of different dimensions of the video are fully considered, thereby improving the accuracy of the video classification.

Video classification method and device and server

Video classification method and device and server

Video classification method and device and server

Owner:TENCENT TECH (SHENZHEN) CO LTD

Multi-feature-fusion Chines-text classification method based on Attention neural network

ActiveCN108460089AExcavate fully and comprehensivelyImprove classification accuracyCharacter and pattern recognitionNeural architecturesCategory recognitionGranularity

A solution of the invention discloses a multi-feature-fusion Chines-text classification method based on Attention neural network, and belongs to the field of natural language processing. In order to further improve accuracy of Chinese-text classification, the method fully exploits features of text data under three different sizes of convolution kernel granularity through fusing three CNN paths; interconnections among the text data are manifested through fusing an LSTM path; and in particular, relatively important data features are enabled to play a greater role in a Chinese-text class recognition process through merging a provided Attention algorithm model, and thus recognition ability of a model on Chinese text classes is improved. Experiment results show that compared with a CNN model, an LSTM structure model and a combined model of the two parts under the same experiment conditions, the model provided by the invention is significantly improved in Chinese-text classification accuracy, and can be better applied to the Chinese-text classification field with high requirements on the classification accuracy.

Multi-feature-fusion Chines-text classification method based on Attention neural network

Multi-feature-fusion Chines-text classification method based on Attention neural network

Multi-feature-fusion Chines-text classification method based on Attention neural network

Owner:HAINAN NORMAL UNIV

Category based, extensible and interactive system for document retrieval

InactiveCN1535433AFulfil requirementsFast deliveryWireless commuication servicesData switching networksWeb siteDocument analysis

An integrated, automatic and open information retrieval system comprises an hybrid method based on linguistic and mathematical approaches for an automatic text categorization. It solves the problems of conventional systems by combining an automatic content recognition technique with a self-learning hierarchical scheme of indexed categories. In response to a word submitted by a requestor, said system retrieves documents containing that word, analyzes the documents to determine their word-pair patterns, matches the document patterns to database patterns that are related to topics, and thereby assigns topics to each document. If the retrieved documents are assigned to more than one topic, a list of the document topics is presented to the requestor, and the requestor designates the relevant topics. The requestor is then granted access only to documents assigned to relevant topics. A knowledge database linking search terms to documents and documents to topics is established and maintained to speed future searches. Additionally, new strategies are presented to deal with different update frequencies of changed Web sites.

Category based, extensible and interactive system for document retrieval

Category based, extensible and interactive system for document retrieval

Category based, extensible and interactive system for document retrieval

Owner:COGISUM INTERMEDIA

Method and device for text classification

InactiveCN101634983AImprove the accuracy of judgmentSpecial data processing applicationsText categorizationHuman–computer interaction

An embodiment of the invention discloses a method and a device for text classification. The method comprises: acquiring an affective characteristic word from an input text; acquiring an affective aptitude degree of the affective characteristic word according to a synonym storehouse constructed in advance; and classifying the text according to the affective aptitude degree of the affective characteristic word. The embodiment of the invention is used to acquire the affective aptitude degree of the affective characteristic word in the text for text classification according to the synonym storehouse constructed in advance and improves the accurate degree of judging the affective aptitude degree of the words.

Method and device for text classification

Method and device for text classification

Method and device for text classification

Owner:HUAWEI TECH CO LTD

Natural language processing-based multi-language analysis method and device

ActiveCN108197109AQuality improvementEnable multilingual analysisCharacter and pattern recognitionNatural language data processingAlgorithmModel selection

The invention discloses a natural language processing-based multi-language analysis method and device. The method comprises the following steps of: selecting to input a natural language text information language category through a language detection training model; obtaining word embedding expression information of corresponding words which can be recognized by a computer through a trained word vector model, and extracting a keyword of the obtained word embedding expression information through a TF-IDF manner; calculating an article vector and a category vector of each preset category according to the keyword and a keyword weight, and calculating a similarity between an article of natural language text information and each preset category so as to determine a text classification result ofthe natural language text information; and inputting the word embedding expression information of the natural language text information into a trained convolutional neural network and a parallel-framework text emotion analysis model of a bidirectional gate circulation unit, and obtaining a final emotion tendency value through calculation. According to the method and device, the problem that traditional multi-language analysis method needs to know domain knowledges of related linguistics and needs plenty of manpower to carry out operation is solved.

Natural language processing-based multi-language analysis method and device

Natural language processing-based multi-language analysis method and device

Natural language processing-based multi-language analysis method and device

Owner:北京百分点科技集团股份有限公司

Text information analysis apparatus and method

InactiveUS7099819B2Quick classificationQuickly arrangeDigital data information retrievalSemantic analysisInformation analysisText categorization

Text information analysis apparatus arranges a plurality of texts according to the content of each text. In the text information analysis apparatus, a category decision unit classifies text to one of a plurality of predetermined categories. A cluster generation unit clusters texts having similar contents from the plurality of texts. A control unit controls the category decision unit and the cluster generation unit to simultaneously execute a category decision and clustering for the plurality of texts.

Text information analysis apparatus and method

Text information analysis apparatus and method

Text information analysis apparatus and method

Owner:KK TOSHIBA

Popular searches

Artificial intelligence Speech identification Syntax Recognition system Human language Extraction methods Utterance Information extraction Document preparation Document retrieval