Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

430 results about "Text mining" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Text mining, also referred to as text data mining, roughly equivalent to text analytics, is the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. 'High quality' in text mining usually refers to some combination of relevance, novelty, and interest. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).

Method for data and text mining and literature-based discovery

InactiveUS6886010B2Maximizing numberData processing applicationsDigital data processing detailsText miningCo-occurrence

Text searching is achieved by techniques including phrase frequency analysis and phrase-co-occurrence analysis. In many cases, factor matrix analysis is also advantageously applied to select high technical content phrases to be analyzed for possible inclusion within a new query. The described techniques may be used to retrieve data, determine levels of emphasis within a collection of data, determine the desirability of conflating search terms, detect symmetry or asymmetry between two text elements within a collection of documents, generate a taxonomy of documents within a collection, and perform literature-based problem solving. (This abstract is intended only to aid those searching patents, and is not intended to limit the disclosure of claims in any manner.)

Method for data and text mining and literature-based discovery

Method for data and text mining and literature-based discovery

Method for data and text mining and literature-based discovery

Owner:NAVY UNITED STATES OF AMERICA AS REPRESENTED BY THE SECY OF THE

Text mining system for web-based business intelligence applied to web site server logs

InactiveUS7330850B1Data processing applicationsMultiple digital computer combinationsClustered dataWeb site

A text mining system for collecting business intelligence about a client, as well as for identifying prospective customers of the client, for use in a lead generation system accessible by the client via the Internet. The text mining system has various components, including a data acquisition process that extracts textual data from Internet web sites, including their logs, content, processes, and transactions. The system compares log data to content and process data, and relates the results of the comparison to transaction data. This permits the system to provide aggregate cluster data representing statistics useful for customer lead generation.

Text mining system for web-based business intelligence applied to web site server logs

Text mining system for web-based business intelligence applied to web site server logs

Text mining system for web-based business intelligence applied to web site server logs

Owner:CALLAHAN CELLULAR L L C +1

Text mining system for web-based business intelligence

InactiveUS20060004731A1Web data retrievalAdvertisementsText miningData acquisition

A text mining system for collecting business intelligence about a client, as well as for identifying prospective customers of the client, for use in a lead generation system accessible by the client via the Internet. The text mining system has various components, including a data acquisition process that extracts textual data from various Internet sources, a database for storing the extracted data, a text mining server that executes query-based searches of the database, and an output repository. A web server provides client access to the repository, and to the mining server.

Text mining system for web-based business intelligence

Text mining system for web-based business intelligence

Text mining system for web-based business intelligence

Owner:ALTO DYNAMICS LLC

Text mining method and apparatus for extracting features of documents

InactiveUS6882747B2Little memory spaceEasy to implementDigital data information retrievalData processing applicationsFeature extractionText mining

Concerning feature extraction of documents in text mining, a method and an apparatus for extracting features having the same nature as those by LSA are provided that require smaller memory space and simpler program and apparatus than the apparatus for executing LSA. Features of each document are extracted by feature extracting acts on the basis of a term-document matrix updated by term-document updating acts and of a basis vector, spanning a space of effective features, calculated by basis vector calculations. Execution of respective acts is repeated until a predetermined requirement given by a user is satisfied.

Text mining method and apparatus for extracting features of documents

Text mining method and apparatus for extracting features of documents

Text mining method and apparatus for extracting features of documents

Owner:SSR +1

Mining interactions to manage customer experience throughout a customer service lifecycle

ActiveUS20100138282A1Special data processing applicationsMarketingText miningCustomer delight

A customer experience is improved through data mining and text mining technologies and that derive insights about a customer by analyzing interactions between the customer and a customer service agent. One or more numerical measurements of customer satisfaction are derived and recommended actions are provided to an agent to enhance the customer experience throughout a customer service lifecycle.

Mining interactions to manage customer experience throughout a customer service lifecycle

Mining interactions to manage customer experience throughout a customer service lifecycle

Mining interactions to manage customer experience throughout a customer service lifecycle

Owner:24 7 AI INC

Method and system for detecting frequent association patterns

InactiveUS6618725B1Easy to understand intuitivelyIncrease speedData processing applicationsDigital data information retrievalData setText mining

A text-mining system and method automatically extracts useful information from a large set of tree-structured data by generating successive sets of candidate tree-structured association patterns for comparison with the tree-structured data. The number of times is counted that each of the candidate association patterns matches with a tree in the set of tree-structured data in order to determine which of the candidate association patterns frequently matches with a tree in the data set. Each successive set of candidate association patterns is generated from the frequent association patterns determined from the previous set of candidate association patterns.

Method and system for detecting frequent association patterns

Method and system for detecting frequent association patterns

Method and system for detecting frequent association patterns

Owner:IBM CORP

System for synergistic data processing

InactiveUS20080077451A1Improve reliabilityThe result is accurateFinanceData miningThird partyText mining

A data analysis system that includes an information mining engine for extracting structured data from unstructured data, a data store for storing the extracted structured data, data received from third party data sources, and data received from sensors monitoring insured property is described. The system also includes a business logic processor that synergistically analyzes the structured data extracted by the text mining engine, the data received from the sensor, and the data received from the third party data source to make an insurance evaluation.

System for synergistic data processing

System for synergistic data processing

System for synergistic data processing

Owner:HARTFORD FIRE INSURANCE

Web-based system and method for archiving and searching participant-based internet text sources for customer lead data

InactiveUS7003517B1AdvertisementsWeb data retrievalText miningWeb service

A text mining system for collecting business intelligence about a client, as well as for identifying prospective customers of the client, for use in a lead generation system accessible by the client via the Internet. The text mining system has various components, including a data acquisition process that extracts textual data from various Internet sources, a database for storing the extracted data, a text mining server that executes query-based searches of the database, and an output repository. A web server provides client access to the repository, and to the mining server.

Web-based system and method for archiving and searching participant-based internet text sources for customer lead data

Web-based system and method for archiving and searching participant-based internet text sources for customer lead data

Web-based system and method for archiving and searching participant-based internet text sources for customer lead data

Owner:ALTO DYNAMICS LLC

System and method for facilitating skill gap analysis and remediation based on tag analytics

InactiveUS20080313000A1Improve matchMultiprogramming arrangementsOffice automationText miningUser input

This invention includes a workforce management system having a system bus, at least one database in communication with the system bus that includes data representative of workforce employees, and social networking data associated with the employees. A matching functional unit includes a text mining function for mining contextual information from the at least one database to generate context labels for an employee, a clustering function for generating concept labels for an employee, and a matching function that sorts and matches employees by the labels in accordance with a matching criteria. A user interface provides user input to the support operation of the workforce management system.

System and method for facilitating skill gap analysis and remediation based on tag analytics

System and method for facilitating skill gap analysis and remediation based on tag analytics

System and method for facilitating skill gap analysis and remediation based on tag analytics

Owner:IBM CORP

System and method for hybrid text mining for finding abbreviations and their definitions

ActiveUS7536297B2Natural language data processingSpeech recognitionText miningPattern generation

This present invention matches one or more abbreviations to one or more definitions. The invention has an abbreviation pattern generation process that generates one or more abbreviation patterns corresponding to the candidate abbreviations, and a definition pattern generation process that generates one or more definition patterns corresponding to the candidate definitions.

System and method for hybrid text mining for finding abbreviations and their definitions

System and method for hybrid text mining for finding abbreviations and their definitions

System and method for hybrid text mining for finding abbreviations and their definitions

Owner:IBM CORP

Microblog user interest recognizing method based on text mining

InactiveCN103942340AImprove recognition accuracyReduce raw data volumeSpecial data processing applicationsText database clustering/classificationText miningFeature extraction

The invention discloses a microblog user interest recognizing method based on text mining, and belongs to the field of text mining and natural language processing. The method includes the steps of collecting the newest topical microblog text data of a microblog text set and microblog text data of a designated user, standardizing the collected microblog text data, recognizing the newest microblog words and renewing a new word dictionary for the standardized topical microblog text data through the microblog new word recognition method, conducting Chinese character word separation on the standardized microblog text data of the designated user through the new word dictionary word separation method to achieve text vector expression, clustering the microblog text data, expressed through text vectors, of the designated user, recombining original microblog text data, extracting new text set features through a topic model, presetting topic dictionaries, calculating the weight of each topic dictionary based on the new text set features to obtain the final topic, and enabling the final topic to serve as the microblog user interest recognition, thereby improving accuracy of feature extraction.

Microblog user interest recognizing method based on text mining

Microblog user interest recognizing method based on text mining

Microblog user interest recognizing method based on text mining

Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

Method and system for mining a document containing dirty text

InactiveUS6978275B2Limited rangeExtract content accuratelyData processing applicationsDigital data information retrievalText miningDocument preparation

A method and system for mining a document containing dirty text. Dirty text is removed or replaced and the document is processed using a variety of text mining techniques. In one embodiment, dirty text removal and replacement occurs in two stages. In the first stage, a general cleaning occurs on all documents without regard to what domain they belong to or the mining task to be performed. In the second stage, document cleaning is more specific to the anomalies of the domain and the mining task to be performed. In the third stage, the document is processed using a variety of data mining techniques according to the mining task. In one embodiment, the present invention scores and ranks sentences in a document according to their relevance, extracts the highest ranked sentences, and presents a summary. The present invention allows users to leverage existing domain knowledge and can be customized according the domain and task requirements.

Method and system for mining a document containing dirty text

Method and system for mining a document containing dirty text

Method and system for mining a document containing dirty text

Owner:HEWLETT PACKARD DEV CO LP

Recommending personally interested contents by text mining, filtering, and interfaces

ActiveUS20130254217A1Digital data information retrievalDigital data processing detailsPersonalizationData stream

A personalized content recommendation system includes a client interface device configured to monitor a user's information data stream. A collaborative filter remote from the client interface device generates automated predictions about the interests of the user. A database server stores personal behavioral profiles and user's preferences based on a plurality of monitored past behaviors and an output of the collaborative user personal interest inference engine. A programmed personal content recommendation server filters items in an incoming information stream with the personal behavioral profile and identifies only those items of the incoming information stream that substantially matches the personal behavioral profile. The identified personally relevant content is then recommended to the user following some priority that may consider the similarity between the personal interest matches, the context of the user information consumption behaviors that may be shown by the user's content consumption mode.

Recommending personally interested contents by text mining, filtering, and interfaces

Recommending personally interested contents by text mining, filtering, and interfaces

Recommending personally interested contents by text mining, filtering, and interfaces

Owner:UT BATTELLE LLC

Text mining apparatus and associated methods

InactiveUS20060206306A1Accurate count valueData processing applicationsDigital data information retrievalPersonalizationText mining

A method for extracting key terms and associated key terms for use in text mining is provided. The method includes receiving unstructured text documents, such as emails over a customer service system. Term candidates are extracted based on identifying consecutive word strings satisfying a context independency threshold. Term candidates are weighted using mutual information to generate a list of weighted terms. The weighted terms are then recounted. Terms are associated based on Chi-square values. Associated terms can then be used for information retrieval. A user interface can be personalized with individual user profiles.

Text mining apparatus and associated methods

Text mining apparatus and associated methods

Text mining apparatus and associated methods

Owner:MICROSOFT TECH LICENSING LLC

System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition

InactiveUS6973428B2Natural language data processingSpeech recognitionText miningSpeech identification

A speech conversation is changed to a text transcript, which is then pre-processed and subjected to text mining to determine salient terms. Salient terms are those terms that meet a predetermined level of selectivity in a collection. The text transcript of the speech conversation is displayed by emphasizing the salient terms and minimizing non-salient terms. An interface is provided that allows a user to select a salient term, whereupon the speech conversation is played beginning at the location, in the speech file, of the selected salient term.

System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition

System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition

System and method for searching, analyzing and displaying text transcripts of speech after imperfect speech recognition

Owner:NUANCE COMM INC

ResLCNN model-based short text classification method

InactiveCN107562784AImprove classification effectNeural architecturesSpecial data processing applicationsText miningText categorization

The invention discloses a ResLCNN model-based short text classification method, relates to the technical field of text mining and deep learning, and in particular to a deep learning model for short text classification. According to the method, characteristics of a long-short term memory network and a convolutional neural network are combined to build a ResLCNN deep text classification model for short text classification. The model comprises three long-short term memory network layer and one convolutional neural network layer; and through using a residual model theory for reference, identity mapping is added between the first long-short term memory network layer and the convolutional neural network layer to construct a residual layer, so that the problem of deep model gradient missing is relieved. According to the model, the advantage, of obtaining long-distance dependency characteristics of text sequence data, of the long-short term memory network and the advantage, of obtaining localfeatures of sentences through convolution, of the convolutional neural network are effectively combined, so that the short text classification effect is improved.

ResLCNN model-based short text classification method

ResLCNN model-based short text classification method

ResLCNN model-based short text classification method

Owner:TONGJI UNIV

Text mining system for web-based business intelligence

InactiveUS7315861B2Web data retrievalAdvertisementsText miningWeb service

A text mining system for collecting business intelligence about a client, as well as for identifying prospective customers of the client, for use in a lead generation system accessible by the client via the Internet. The text mining system has various components, including a data acquisition process that extracts textual data from various Internet sources, a database for storing the extracted data, a text mining server that executes query-based searches of the database, and an output repository. A web server provides client access to the repository, and to the mining server.

Text mining system for web-based business intelligence

Text mining system for web-based business intelligence

Text mining system for web-based business intelligence

Owner:ALTO DYNAMICS LLC

Individualized recommendation method and system based on distributed B2B platform

ActiveCN103886487AEasy to handleQuick responseMarketingSpecial data processing applicationsWeb sitePersonalization

The invention discloses an individualized recommendation method and system based on a B2B platform on a Hadoop platform of a distributed technology. The method comprises the steps that firstly, centralized placement, storage and query are carried out on various data, such as website log files, product information and user information, based on the Hadoop distributed storage technology, and the data are processed fast and efficiently; secondly, the data are preprocessed through a Hive service under the Hadoop platform, a fast and efficient implementation recommendation algorithm is achieved through Map / Reduce; then, the information retrieval and file mining work is achieved on text information through the Map / Reduce, the product information needed in inquiry and purchase by a user is matched, and individualized recommendation information is acquired; finally, large-data storage and query are provided through an HBase service under the Hadoop platform, and website recommendation user responses are improved.

Individualized recommendation method and system based on distributed B2B platform

Individualized recommendation method and system based on distributed B2B platform

Individualized recommendation method and system based on distributed B2B platform

Owner:FOCUS TECH

Body automatic build system and method based on text mining

InactiveCN101710343AAvoid influenceShort build cycleSpecial data processing applicationsText miningThe Internet

The invention relates to the body building field, in particular to a body automatic build system and a method based on the text mining field. The system mainly comprises a linguistic data pretreatment subsystem, a text mining subsystem and a body build subsystem, wherein, the linguistic data pretreatment subsystem is used for receiving and processing relevant data provided by a user, the text mining subsystem is used for analyzing and mining the relevant knowledge in the linguistic data, and the body build subsystem is used for organizing and building the final field body knowledge base. The body automatic build system and the method based on text mining of the invention can finish the automatic building of the body under the condition of few manual interventions, thus shortening the body building period, and saving a great amount of manpower, material resources and financial resource input during manual body building; and meanwhile, the system and the method can fully utilize internet information and electronic resources owned by users for better blending, reasoning and disambiguation and avoid the effect on the body knowledge base by the view of individual expert and scholar.

Body automatic build system and method based on text mining

Body automatic build system and method based on text mining

Body automatic build system and method based on text mining

Owner:BEIJING ZHONGJIKEHAI TECH & DEV

Method and system for displaying time-series data and correlated events derived from text mining

InactiveUS7570262B2Drawing from basic elementsVisual data miningText miningTime segment

The present invention is directed to a method and system for correlating time-series data with events derived from text mining. The system is configured to receive a time period and a parameter concerning an entity, retrieve an event which is related to the entity and occurred within the time period from events which are previously extracted automatically from unstructured text, and display an indication of the event superimposed on a display representing the time series of the parameter for the time period.

Method and system for displaying time-series data and correlated events derived from text mining

Method and system for displaying time-series data and correlated events derived from text mining

Method and system for displaying time-series data and correlated events derived from text mining

Owner:REFINITIV US ORG LLC

Text mining method based on online medical question and answer information

ActiveCN104965992AEffectively obtain the relationshipScalableSpecial data processing applicationsExtensibilityConditional random field

The invention discloses a text mining method based on online medical question and answer information. The text mining method comprises the following steps of: extracting disease question and answer information from an obtained original webpage by adopting a network data extracting mode based on DOM and a webpage template; carrying out medical named entity identification in the extracted disease question and answer information by virtue of characteristics of a conditional random field model; and mining a medical entity relationship by virtue of the medical named entity identification. The method can be used for effectively obtaining a potential association relationship among various entities. The method is suitable for mining work of all disease classes, and has certain expandability.

Text mining method based on online medical question and answer information

Text mining method based on online medical question and answer information

Text mining method based on online medical question and answer information

Owner:NANKAI UNIV

Intelligent automated method and system for optimizing the value of the sale and/or purchase of certain advertising inventory

InactiveUS20070199017A1Eliminate false positiveHigh degree of correlationTelevision system detailsColor television detailsClosed captioningText mining

The present invention creates an intelligent automated system that enables media outlets to optimize the value of their advertising inventory. It also enables media outlets, on a platform-agnostic basis, to market advertising inventory driven by content-based criteria rather than audience data alone. This is achieved preferably by text mining programming content in context and by interpreting the accompanying audio tracks, in text form, from a closed captioning system or from a real time voice recognition system or from any other source of video and / or program content. The present invention searches through opportunities for an advertiser, or advertising category, on any number of media outlets. The application of in context text mining to advertisement unit placement allows the advertiser to reach more viewers who are engaged and predisposed to receiving the advertiser's message.

Intelligent automated method and system for optimizing the value of the sale and/or purchase of certain advertising inventory

Intelligent automated method and system for optimizing the value of the sale and/or purchase of certain advertising inventory

Intelligent automated method and system for optimizing the value of the sale and/or purchase of certain advertising inventory

Owner:COZEN GARY S +3

Combined wrong question recommendation method based on knowledge graph

ActiveCN107273490AUnderstanding Semantic AssociationsImprove accuracySpecial data processing applicationsText miningNear neighbor

The invention discloses a combined wrong question recommendation method based on a knowledge graph. Wrong questions relevant to weak knowledge points of a learner can be accurately recommended for the learner by adopting the method. The method comprises the steps that knowledges are extracted from large-scale unstructured test question data to establish the knowledge graph; text mining and word segmentation are conducted on the wrong questions of the learner to extract wrong question keywords, and thus knowledge points including in the wrong questions are determined; semantic near neighbors of the knowledge points are obtained by analyzing semantic similarity of the test questions; the wrong question knowledge points are mapped into the knowledge graph to obtain test question entities conforming to their knowledge points. In addition, similarity weight calculation is conducted on a test question library to obtain similarity matrixes of test paper, a collaborative filtering technology is utilized to obtain recommended test questions of the wrong questions. Finally, two recommendation results are further combined in weighing, mixing, superposing and element-level modes, and a final recommendation result is given.

Combined wrong question recommendation method based on knowledge graph

Combined wrong question recommendation method based on knowledge graph

Combined wrong question recommendation method based on knowledge graph

Owner:BEIJING UNIV OF TECH

System and method for evaluating text to support multiple insurance applications

ActiveUS20140379386A1Promote resultsEfficiently and accuratelyFinanceSemantic analysisText miningText entry

A system for evaluating text data to support multiple insurance applications is disclosed. In some embodiments, text input data is received from multiple sources. The text input data may then be aggregated and mapped to create composite text input data. A semantic event in the composite text input data may be automatically detected, such as by being triggered by a semantic rule and associated semantic tag. A text mining result database may be updated by adding an entry to the database identifying the detected semantic event and the triggering semantic rule. An indication associated with the text mining result database may then be transmitted to a plurality of insurance applications.

System and method for evaluating text to support multiple insurance applications

System and method for evaluating text to support multiple insurance applications

System and method for evaluating text to support multiple insurance applications

Owner:HARTFORD FIRE INSURANCE

Mining interactions to manage customer experience throughout a customer service lifecycle

ActiveUS8396741B2Manual exchangesAutomatic exchangesText miningHuman–computer interaction

A customer experience is improved through data mining and text mining technologies and that derive insights about a customer by analyzing interactions between the customer and a customer service agent. One or more numerical measurements of customer satisfaction are derived and recommended actions are provided to an agent to enhance the customer experience throughout a customer service lifecycle.

Mining interactions to manage customer experience throughout a customer service lifecycle

Mining interactions to manage customer experience throughout a customer service lifecycle

Mining interactions to manage customer experience throughout a customer service lifecycle

Owner:24 7 AI INC

Systems and methods for facilitating dialogue mining

ActiveUS20170235740A1Convenient MiningImprove service qualityDigital data information retrievalSemantic analysisText miningPrediction algorithms

The disclosure is related to mining of text to derive information from the text that is useful for a variety of purposes. The text mining process can be implemented in a service oriented industry such as a call center, where a customer and an agent engage in a dialog, e.g., to discuss product / service related issues. The messages in dialogues between the customers and the agents are tagged with features that describe an aspect of the conversation. The text mining process can mine various dialogues and identify a set of features and messages based on prediction algorithms. The identified set of features and messages can be used to infer an intent of a particular customer for contacting the agent, and to generate a recommendation based on the determined intent.

Systems and methods for facilitating dialogue mining

Systems and methods for facilitating dialogue mining

Systems and methods for facilitating dialogue mining

Owner:24 7 AI INC

Knowledge pattern search from networked agents

InactiveUS20080086436A1Digital data information retrievalDigital computer detailsText miningApplication software

A method searches for new, unique and interesting information using knowledge patterns discovered through data mining and text mining, machine learning (including supervised or unsupervised) and pattern recognition methods. The method is implemented as a computer program acting as an agent installed in a computer node or multiple nodes in a networked environment. The system is useful for improving search experience and used in knowledge discovery applications when new, unique and interesting information is critical. The system is also useful for introducing new concepts and products for business applications.

Knowledge pattern search from networked agents

Knowledge pattern search from networked agents

Knowledge pattern search from networked agents

Owner:ZHAO DR YING +1

System and method for facilitating skill gap analysis and remediation based on tag analytics

InactiveUS8060451B2Improve matchOffice automationResourcesText miningUser input

This invention includes a workforce management system having a system bus, at least one database in communication with the system bus that includes data representative of workforce employees, and social networking data associated with the employees. A matching functional unit includes a text mining function for mining contextual information from the at least one database to generate context labels for an employee, a clustering function for generating concept labels for an employee, and a matching function that sorts and matches employees by the labels in accordance with a matching criteria. A user interface provides user input to the support operation of the workforce management system.

System and method for facilitating skill gap analysis and remediation based on tag analytics

System and method for facilitating skill gap analysis and remediation based on tag analytics

System and method for facilitating skill gap analysis and remediation based on tag analytics

Owner:INT BUSINESS MASCH CORP

Method and system for automatic analysis of hotspot subject propagation process in the internet

ActiveCN101231641AEfficientRobustSpecial data processing applicationsInformation processingText mining

The invention relates to a method which can automatically analyze the propagation process of an internet hot subject, as well as a system thereof, and belongs to the intelligent information processing technology. As the textual information on the internet gradually increases, an important subject in the text mining and information retrieval field is to automatically detect and analyze the hot or sensitive subject from large text database, the subject has great use value. The invention utilizes the natural language processing approach to automatically analyze the propagation process of the text document in the given hot or sensitive subject; after the text documents in the subject are arranged in a time order, the reference origin of the current text document is searched by utilizing the pattern matching method from the first text document, if the reference origin isn't found, the reference origin is further judged by utilizing the text document similarity comparative method, at the same time, the corresponding source text document is obtained. At last, the reference relation is intuitively presented to the user in a graphic mode. The method is widely applicable to internet intelligent information processing, public opinion analyzing and monitoring, etc.

Method and system for automatic analysis of hotspot subject propagation process in the internet

Method and system for automatic analysis of hotspot subject propagation process in the internet

Method and system for automatic analysis of hotspot subject propagation process in the internet

Owner:PEKING UNIV

Chinese text parallel data mining method based on hierarchy

ActiveCN102662952AImprove word segmentation efficiencyImprove word segmentation accuracySpecial data processing applicationsData dredgingText mining

The invention relates to a Chinese text parallel data mining method based on hierarchy, comprising the steps of: step 1: a establishing vector space model of Chinese texts: performing work segmentation regarding to the entire Chinese text set to obtain a word segmentation form and a feature term set containing all removed duplicated terms in the text set of each text, then using the feature term set to count the term frequency-inverse document frequency (TFIDF) of each text, and establishing the text vector space model according to the TFIDF; step 2: performing dimension reduction regarding to a feature item vector of the text vector space model; and step 3: clustering texts using DCURE algorithm based on hierarchy. The method is efficient in word segmentation of Chinese texts with high accuracy, requires no input of parameters like radius of neighborhood for the clustering process, can mine irregular cluster and is insensitive to noise, employs distributed calculating, has high efficiency in mining mass texts and improves calculating speed of feature weight.

Chinese text parallel data mining method based on hierarchy

Chinese text parallel data mining method based on hierarchy

Chinese text parallel data mining method based on hierarchy

Owner:UESTC COMSYS INFORMATION

Popular searches

Problem solve Asymmetry Text searching Documentation Search terms Frequency analysis Factor matrix Phrase Server log Transaction data