Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

50 results about "Latent semantic indexing" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

Latent semantic indexing is an indexing and retrieval method that uses a mathematical technique called singular value decomposition to identify patterns in the relationships between the terms and concepts contained in an unstructured collection of text. LSI is based on the principle that words that are used in the same contexts tend to have similar meanings. A key feature of LSI is its ability to extract the conceptual content of a body of text by establishing associations between those terms that occur in similar contexts. LSI is also an application of correspondence analysis, a multivariate statistical technique developed by Jean-Paul Benzécri in the early 1970s, to a contingency table built from word counts in documents. Called Latent Semantic Indexing because of its ability to correlate semantically related terms that are latent in a collection of text, it was first applied to text at Bellcore in the late 1980s. The method, also called latent semantic analysis, uncovers the underlying latent semantic structure in the usage of words in a body of text and how it can be used to extract the meaning of the text in response to user queries, commonly referred to as concept searches.

Method for document comparison and selection

InactiveUS7113943B2Reduce the impactImprove consistencyDigital data information retrievalData processing applicationsDocument preparationDocumentation

Extensions to latent semantic indexing (LSI), including: phrase processing, creation of generalized entities, elaboration of entities, replacement of idiomatic expressions, and use of data fusion methods to combine the aforementioned extensions in a synergistic fashion. Additionally, novel methods tailored to specific applications of LSI are disclosed.

Method for document comparison and selection

Method for document comparison and selection

Method for document comparison and selection

Owner:RELATIVITY ODA LLC

Word sense disambiguation

InactiveUS20020026456A1Data processing applicationsSemantic analysisExternal referenceWord sense

Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.

Word sense disambiguation

Word sense disambiguation

Word sense disambiguation

Owner:RELATIVITY ODA LLC

Method and system for facilitating the refinement of data queries

InactiveUS6954750B2Data processing applicationsDigital data information retrievalRankingDocument preparation

Refining a current query. Receiving information regarding the relevancy of documents retrieved from a document collection in response to a current query. Ranking the retrieved documents in accordance with the relevancy information. Forming a candidate query based on the rankings and analysis of locations of the retrieved documents in a latent semantic index vector space formed from the retrieved document. Applying the candidate query to the document collection. Ranking the documents retrieved in response to the candidate query in accordance with the received relevancy information. Comparing the ranking of documents retrieved in response to the candidate query and the ranking of documents retrieved in response to the current query with the received relevancy information. Choosing the query that produces the best ranking.

Method and system for facilitating the refinement of data queries

Method and system for facilitating the refinement of data queries

Method and system for facilitating the refinement of data queries

Owner:RELATIVITY ODA LLC

Automatic recommendation of products using latent semantic indexing of content

InactiveUS20040039657A1Digital data information retrievalCommerceUser inputSimilarity measure

Techniques for using latent semantic structure of textual content ascribed to the items to provide automatic recommendations to the user. A user inputs a selected item and, in turn, a latent semantic algorithm is applied to the user selection and the textual content of the items in a database to generate a conceptual similarity between the selection and the items. A set of nearest items to the selected item is provided as a recommendation to the user of other items that may be of particular interest or relevance to the user's original selection based upon the conceptual similarity measure.

Automatic recommendation of products using latent semantic indexing of content

Automatic recommendation of products using latent semantic indexing of content

Automatic recommendation of products using latent semantic indexing of content

Owner:CONTENT ANALYST

System and method for hierarchical segmentation with latent semantic indexing in scale space

InactiveUS7137062B2Easy to analyzeReduce dimensionalityNatural language data processingSpecial data processing applicationsVisual presentationRelevant information

A system and method for automatically generating a hierarchical table of contents or outline for indexing a document and identifying clusters of related information in the document. The document may comprise text, audio, video, or a multimedia presentation. The invention employs a unique and novel combination of latent semantic indexing techniques to identify related blocks and major topic changes within the document with scale space segmentation techniques to respectively identify self-similar blocks within the document and to thus find topic changes of various sizes at block edges. The invention then produces a visual presentation of the semantic structure of the document.

System and method for hierarchical segmentation with latent semantic indexing in scale space

System and method for hierarchical segmentation with latent semantic indexing in scale space

System and method for hierarchical segmentation with latent semantic indexing in scale space

Owner:IBM CORP

Differential LSI space-based probabilistic document classifier

InactiveUS7024400B2Retention characteristicImprove adaptabilityDigital computer detailsChaos modelsSemanticsCombined use

A computerized method for automatic document classification based on a combined use of the projection and the distance of the differential document vectors to the differential latent semantics index (DLSI) spaces. The method includes the setting up of a DLSI space-based classifier to be stored in computer storage and the use of such classifier by a computer to evaluate the possibility of a document belonging to a given cluster using a posteriori probability function and to classify the document in the cluster. The classifier is effective in operating on very large numbers of documents such as with document retrieval systems over a distributed computer network.

Differential LSI space-based probabilistic document classifier

Differential LSI space-based probabilistic document classifier

Differential LSI space-based probabilistic document classifier

Owner:SUNFLARE CO LTD

Semantic querying a peer-to-peer network

InactiveUS7039634B2Data processing applicationsDigital data processing detailsSemantic vectorSemantic query

In a method of semantic querying in a peer-to-peer network, an item of information is mapped into a semantic vector based on the latent semantic indexing algorithm or any IR algorithms that can derive a vector representation. The semantic vector is associated with an address index as a key pair. The key pair is stored in an overlay network formed from the peer-to-peer network such that the stored key pair is proximally located to at least one other key pair having a similar semantic vector.

Semantic querying a peer-to-peer network

Semantic querying a peer-to-peer network

Semantic querying a peer-to-peer network

Owner:HEWLETT PACKARD DEV CO LP

Method and apparatus for providing information in a peer-to-peer network

InactiveUS20040181607A1Web data indexingDigital computer detailsPeer-to-peerLatent semantic indexing

In a method of providing information in a peer-to-peer network, a query is received and a profile is generated based on the query by applying a latent semantic indexing algorithm. The profile is routed to a selected node based on the profile falling within a zone owned by the selected node.

Method and apparatus for providing information in a peer-to-peer network

Method and apparatus for providing information in a peer-to-peer network

Method and apparatus for providing information in a peer-to-peer network

Owner:HEWLETT PACKARD DEV CO LP

Word sense disambiguation

InactiveUS7024407B2Data processing applicationsSemantic analysisExternal referenceWord sense

Systems and methods for word sense disambiguation, including discerning one or more senses or occurrences, distinguishing between senses or occurrences, and determining a meaning for a sense or occurrence of a subject term. In a collection of documents containing terms and a reference collection containing at least one meaning associated with a term, the method includes forming a vector space representation of terms and documents. In some embodiments, the vector space is a latent semantic index vector space. In some embodiments, occurrences are clustered to discern or distinguish a sense of a term. In preferred embodiments, meaning of a sense or occurrence is assigned based on either correlation with an external reference source, or proximity to a reference source that has been indexed into the space.

Word sense disambiguation

Word sense disambiguation

Word sense disambiguation

Owner:RELATIVITY ODA LLC

Semantic querying a peer-to-peer network

InactiveUS20040181511A1Digital data information retrievalData processing applicationsSemantic vectorProximal point

In a method of semantic querying in a peer-to-peer network, an item of information is mapped into a semantic vector based on the latent semantic indexing algorithm or any IR algorithms that can derive a vector representation. The semantic vector is associated with an address index as a key pair. The key pair is stored in an overlay network formed from the peer-to-peer network such that the stored key pair is proximally located to at least one other key pair having a similar semantic vector.

Semantic querying a peer-to-peer network

Semantic querying a peer-to-peer network

Semantic querying a peer-to-peer network

Owner:HEWLETT PACKARD DEV CO LP

Network-based method for analyzing opinion information in discrete text

InactiveCN102110140ARealize TrackingAchieve recoverySpecial data processing applicationsAnalysis dataInformation analysis

The invention relates to a network-based system for analyzing opinion information in a discrete text, belonging to the field of network information safety. The system comprises the following modules: a discrete text information acquisition module which acquires network information in a preset analysis cycle, a discrete text information tracking and restoring module which restores ellipsis and remote anaphora in the original content to obtain a text which contains a relatively complete text structure and semantic information, a semantic information mining and characteristic extracting module which realizes semantic information mining and characteristic extracting on text information by utilizing a latent semantic indexing technology, an opinion information clustering module which realizes information clustering by combining a niche genetic algorithm with a K-Means method, a hot opinion event discovery module which mines the hot opinion in the obtained topic and event, and a background information processing and data supporting center which analyzes data and provides a repertoire specially for a network, new words in the network, the existing class information and the existing hot topics. By applying the invention, the problem that information analysis is influenced as the text structure of the existing network opinion information is incomplete, ellipsis and remote anaphora are more and the new works in the network are more is solved, and the accuracy for discovery of the opinion and hot event is improved by adopting a high-efficiency clustering method.

Network-based method for analyzing opinion information in discrete text

Network-based method for analyzing opinion information in discrete text

Network-based method for analyzing opinion information in discrete text

Owner:GUILIN UNIV OF ELECTRONIC TECH

Supervised semantic indexing and its extensions

ActiveUS20100185659A1Digital data processing detailsDigital computer detailsGramDocument preparation

A system and method for determining a similarity between a document and a query includes providing a frequently used dictionary and an infrequently used dictionary in storage memory. For each word or gram in the infrequently used dictionary, n words or grams are correlated from the frequently used dictionary based on a first score. Features for a vector of the infrequently used words or grams are replaced with features from a vector of the correlated words or grams from the frequently used dictionary when the features from a vector of the correlated words or grams meet a threshold value. A similarity score is determined between weight vectors of a query and one or more documents in a corpus by employing the features from the vector of the correlated words or grams that met the threshold value.

Supervised semantic indexing and its extensions

Supervised semantic indexing and its extensions

Supervised semantic indexing and its extensions

Owner:NEC CORP

Anchor Text-Based Focused Web Crawler Search Method and System

ActiveCN102298622ASpecial data processing applicationsThe InternetWeb crawler

The invention discloses a search method for focused web crawler based on an anchor text and a system thereof. The method mainly comprises the following steps of obtaining a URL (uniform resource locator) from a URL priority query and downloading from the Internet to obtain a Web page according to the URL; analyzing the downloaded Web page and extracting the URL and the anchor text thereof; screening the extracted URL and anchor text thereof; and selecting an algorithm combined by TF-IDF (term frequency-inverse document frequency) and LSI (latent semantic indexing) to calculate a topic correlativity of the URL and putting the URL matched with the condition in the priority query. The system comprises a URL priority query, a web crawler downloader, a Web page library, a URL parser, a URL filter and a topic correlativity identifier. With the adoption of the search method of focused web crawler based on the anchor text and the system thereof, the topic correlativity of the crawling result of the focused web crawler and the crawling efficiency are improved.

Anchor Text-Based Focused Web Crawler Search Method and System

Anchor Text-Based Focused Web Crawler Search Method and System

Anchor Text-Based Focused Web Crawler Search Method and System

Owner:INST OF AUTOMATION CHINESE ACAD OF SCI

An ES-based electronic medical record retrieval method

ActiveCN109299239AImprove accuracyImprove recallDigital data information retrievalSemantic analysisMedical recordMedical terminology

The invention discloses an ES-based electronic medical record retrieval method, which relates to the technical field of medical data retrieval. This method introduces semantic analysis model into electronic medical record analysis, including the extraction of subject words and the calculation of semantic similarity, taking advantage of their advantages in text semantic mining, This paper providesthe algorithm support for the latent semantic mining of text information in electronic medical record retrieval by establishing the general medical semantic database (negative word, synonym, ambiguousword), It realizes the high accuracy and recall rate of information retrieval, better adapts to the medical terminology compared with the common natural language is often more complex and constantlychanging, and medical abbreviations, synonyms and polysemous words more characteristics. It meets the scientific research needs of multi-dimensional combined retrieval and the needs of full-text retrieval of related literatures based on latent semantic search. Realize intelligent full-text retrieval with semantic extension and semantic connotation extension in real sense.

An ES-based electronic medical record retrieval method

An ES-based electronic medical record retrieval method

An ES-based electronic medical record retrieval method

Owner:弘扬软件股份有限公司

System and Method for Configuring Voice Readers Using Semantic Analysis

InactiveUS20070276667A1Speech synthesisNatural language processingLoudness

A system and method for using semantic analysis to configure a voice reader is presented. A text file includes a plurality of text blocks, such as paragraphs. Processing performs semantic analysis on each text block in order to match the text block's semantic content with a semantic identifier. Once processing matches a semantic identifier with the text block, processing retrieves voice attributes that correspond to the semantic identifier (i.e. pitch value, loudness value, and pace value) and provides the voice attributes to a voice reader. The voice reader uses the text block to produce a synthesized voice signal with properties that correspond to the voice attributes. The text block may include semantic tags whereby processing performs latent semantic indexing on the semantic tags in order to match semantic identifiers to the semantic tags.

System and Method for Configuring Voice Readers Using Semantic Analysis

System and Method for Configuring Voice Readers Using Semantic Analysis

System and Method for Configuring Voice Readers Using Semantic Analysis

Owner:ATKIN STEVEN +2

Region based multiple features Integration and multiple-stage feedback latent semantic image retrieval method

InactiveCN1967536AImprove retrieval accuracyImplement Semantic Similarity MatchingSpecial data processing applicationsImage querySemantic feature

The invention discloses a latent semantic image retrieval method of region-oriented multi-feature integration and multi-level feedback. It uses result list returned by the initial keyword search, extracting a variety of region-oriented images characteristics, constructing attribute-image matrix, using latent semantic indexing algorithm to get the semantic space of image sets and semantic features of each image, and then using similar images by users feedback to construct or update image query vector, searching again the semantic space, calculating image semantics features and images inquiries vector similarity, getting outcome sets by descending order, and repeatable retrieval. The invention takes full advantage of image content information, making up for the deficiencies of the keyword search, and through the region-oriented multi-feature integration, enhances image content information from the bottom physical layer to the object layer, then further enhances to the semantic layer by HCI feedback, thereby reducing the gap between the image bottom features and high-level semantic, and allowing Web image retrieval to get higher retrieval accuracy.

Region based multiple features Integration and multiple-stage feedback latent semantic image retrieval method

Region based multiple features Integration and multiple-stage feedback latent semantic image retrieval method

Region based multiple features Integration and multiple-stage feedback latent semantic image retrieval method

Owner:HUAZHONG UNIV OF SCI & TECH

A latent semantic min-Hash-based image retrieval method

ActiveCN106033426AHigh precisionImprove efficiencyCharacter and pattern recognitionSpecial data processing applicationsMatrix decompositionImaging processing

The invention relates to the technical field of image processing and in particular relates to a latent semantic min-Hash-based image retrieval method comprising the steps of (1) obtaining datasets through division; (2) establishing a latent semantic min-Hash model; (3) solving a transformation matrix T; (4) performing Hash encoding on testing datasets Xtest; (5) performing image query. Based on the facts that the convolution network has better expression features and latent semantics of primitive characteristics can be extracted by using matrix decomposition, minimizing constraint is performed on quantization errors in an encoding quantization process, so that after the primitive characteristics are encoded, the corresponding Hamming distances in a Hamming space of semantically-similar images are smaller and the corresponding Hamming distances of semantically-dissimilar images are larger. Thus, the image retrieval precision and the indexing efficiency are improved.

A latent semantic min-Hash-based image retrieval method

A latent semantic min-Hash-based image retrieval method

A latent semantic min-Hash-based image retrieval method

Owner:XI'AN INST OF OPTICS & FINE MECHANICS - CHINESE ACAD OF SCI

Computer-assisted memory translation scheme based on template automaton and latent semantic index principle

ActiveUS7124073B2Natural language translationSpecial data processing applicationsTranslation algorithmNatural language processing

A new, more efficient memory translation algorithm facilitating the acquisition of a most appropriate translation in a target language from among those of nearly narrowed-down candidates of translation by separately applying the so-called dimension reducing functions of a template automaton and the LSI (latent semantic index) technique. Both the template automaton and the LSI principle play an important role in implementing an efficient process of narrowing down an efficient solution space from among the many example sentences of the databases in a target language by exploiting their respective unique search space reduction function. Once developed into a fully operational system, an expert editor rather than an expert translator can tune up the translation memory system, markedly widening the range of available experts who can utilize the system.

Computer-assisted memory translation scheme based on template automaton and latent semantic index principle

Computer-assisted memory translation scheme based on template automaton and latent semantic index principle

Computer-assisted memory translation scheme based on template automaton and latent semantic index principle

Owner:SUNFLARE CO LTD

Unit selection module and method for Chinese text-to-speech synthesis

InactiveUS20060095264A1Prevent inappropriate unit generationAvoid it happening againSpecial data processing applicationsSpeech synthesisNatural language processingStructural distance

This invention relates to a unit selection module for Chinese Text-to-Speech (TTS) synthesis, mainly comprising a probabilistic context free grammar (PCFG) parser, a latent semantic indexing (LSI) module, and a modified variable-length unit selection scheme; any Chinese sentence is firstly input and then parsed into a context-free grammar (CFG) by the PCFG parser; wherein there are several possible CFGs for every Chinese sentence, and the CFG (or the syntactic structure) with the highest probability is then taken as the best CFG (or the syntactic structure) of the Chinese sentence; the LSI module is then used to calculate the structural distance between all the candidate synthesis units and the target unit in a corpus; through the modified variable-length unit selection scheme, tagged with the dynamic programming algorithm, the units are searched to find the best synthesis unit concatenation sequence.

Unit selection module and method for Chinese text-to-speech synthesis

Unit selection module and method for Chinese text-to-speech synthesis

Unit selection module and method for Chinese text-to-speech synthesis

Owner:NAT CHENG KUNG UNIV

Semantic gene organizer

InactiveUS20060047441A1Rapidly and accurately classifyBiological testingSpecial data processing applicationsPaper documentDocument preparation

A semantic gene classification and annotation system, method and computer program can utilize Latent Semantic Indexing (LSI) to identify conceptually related genes based on textual information in biomedical literature, including MEDLINE citations. In addition, term weights calculated from the usage of the gene terms in and across gene documents can be used to automatically assign gene aliases and extend gene function annotation based upon primary biomedical literature.

Semantic gene organizer

Semantic gene organizer

Semantic gene organizer

Owner:UNIV OF TENNESSEE RES FOUND

Solution recommendation based on incomplete data sets

ActiveUS20070179924A1More dataDigital data information retrievalChaos modelsComplete dataData set

In accordance with one aspect of the present exemplary embodiment, a system determines a solution based on received data. An intake component receives an incomplete data set from one or more sources. A recommendation system transforms the incomplete data set into a semantic data set via latent semantic indexing, classifies the semantic data set into an existing cluster and provides one or more solutions of the existing cluster as one or more recommendations.

Solution recommendation based on incomplete data sets

Solution recommendation based on incomplete data sets

Solution recommendation based on incomplete data sets

Owner:XEROX CORP

API (Application Programing Interface) tag recommendation method based on heterogeneous information

InactiveCN106021366AImprove reliabilityImprove accuracySpecial data processing applicationsTransfer matrixHeterogeneous network

The invention discloses an API (Application Program Interface) tag recommendation method based on heterogeneous information, and mainly adopts a random walk algorithm based on the heterogeneous information. The API tag recommendation method comprises the following steps: firstly, according to a relationship among the API, mashup and a mashup tag, establishing a heterogeneous network, wherein the network comprises an inclusion relationship between the API and the mashup, a corresponding relationship between the mashup and the tag and an isomorphic relationship among three elements; then, according to the heterogeneous network, generating a corresponding transfer matrix, carrying out random walk with restart on the basis of the transfer matrix, iteratively transferring to a mashup layer and a tag layer from an API vertex, and finally achieving globally stable distribution so as to obtain a probability for the API to each tag vertex; and finally, importing text processing model (Latent Semantic Indexing) to calculate the semantic similarity of the API and the tag, combining with the obtained probability to generate a final tag sorting list to recommend a proper tag for the API so as to improve tag recommendation accuracy to a large extent.

API (Application Programing Interface) tag recommendation method based on heterogeneous information

API (Application Programing Interface) tag recommendation method based on heterogeneous information

API (Application Programing Interface) tag recommendation method based on heterogeneous information

Owner:ZHEJIANG UNIV

System and method for configuring voice readers using semantic analysis

InactiveCN1788305ASpeech synthesisNatural language processingSpeech rate

A system and method for using semantic analysis to configure a voice reader is presented. A text file includes a plurality of text blocks, such as paragraphs. Processing performs semantic analysis on each text block in order to match the text block's semantic content with a semantic identifier. Once processing matches a semantic identifier with the text block, processing retrieves voice attributes that correspond to the semantic identifier (i.e. pitch value, loudness value, and pace value) and provides the voice attributes to a voice reader. The voice reader uses the text block to produce a synthesized voice signal with properties that correspond to the voice attributes. The text block may include semantic tags whereby processing performs latent semantic indexing on the semantic tags in order to match semantic identifiers to the semantic tags.

System and method for configuring voice readers using semantic analysis

System and method for configuring voice readers using semantic analysis

System and method for configuring voice readers using semantic analysis

Owner:INT BUSINESS MASCH CORP

Perturbing latent semantic indexing spaces

InactiveUS7580910B2Exclude influenceEfficient mergeSemantic analysisComputation using non-denominational number representationTheoretical computer scienceDocument preparation

A text processing method is provided that includes the following steps. First, an abstract mathematical vector space is generated based on a collection of documents. Respective documents in the collection of documents have a representation in the abstract mathematical vector space and respective terms contained in the collection of documents have a representation in the abstract mathematical vector space. Then, the abstract mathematical vector space is perturbed to produce a perturbed abstract mathematical vector space that is stored in an electronic format accessible to a user. Perturbing the abstract mathematical vector space may include modifying the representation of a document with a newly computed representation for that document, or modifying the representation of a term with a newly computed representation for that term.

Perturbing latent semantic indexing spaces

Perturbing latent semantic indexing spaces

Perturbing latent semantic indexing spaces

Owner:RELATIVITY ODA LLC

Selective latent semantic indexing method for information retrieval applications

InactiveUS20070124299A1Reduce and prevent lossGive controlText database indexingSpecial data processing applicationsSingular value decompositionGram

A term-by-document (or part-by-collection) matrix can be used to index documents (or collections) for information retrieval applications. Reducing the rank of the indexing matrix can further reduce the complexity of information retrieval. A method for index matrix rank reduction can involve computing a singular value decomposition and then retaining singular values based on the singular values corresponding to singular values of multiple topics. The expected singular values corresponding to a topic can be determined using the roots of a specially formed characteristic polynomial. The coefficients of the special characteristic polynomial can be based on computing the determinants of a Gram matrix of term (or part) probabilities, a method of recursion, or a method of recursion further weighted by the probability of document (or collection) lengths.

Selective latent semantic indexing method for information retrieval applications

Selective latent semantic indexing method for information retrieval applications

Selective latent semantic indexing method for information retrieval applications

Owner:SELECTIVE

Literature retrieval method based on semantic small-word model

InactiveCN101017504AImprove index recallReduce deliverySpecial data processing applicationsSmall worldsSocial network

This invention discloses one file index method based on language meanings small world, which comprises the following steps: firstly using latent meanings index to extract file property vector to maintain file property to lower its dimensions and to reduce the information memory volume; then using supportive vector machine to sort all common files to form sort information to mark the sort interest proportion; finally using social network small world with small linkage point with high proportion interest of certain file sort to form network topological structure with small property.

Literature retrieval method based on semantic small-word model

Literature retrieval method based on semantic small-word model

Owner:HUAZHONG UNIV OF SCI & TECH

Method, machine learning engines and file management platform systems for content and context aware data classification and security anomaly detection

PendingUS20210319179A1Reduce dimensionalitySemantic analysisCharacter and pattern recognitionFeature extractionConfidentiality

Systems, methods and computer readable medium are provided for perform a method for content and context aware data classification or a method for content and context aware data security anomaly detection. The method for content and context aware data confidentiality classification includes scanning one or more documents in one or more network data repositories of a computer network and extracting content features and context features of the one or more documents into one or more term frequency-inverse document frequency (TF-IDF) vectors and one or more latent semantic indexing (LSI) vectors. The method further includes classifying the one or more documents into a number of category classifications by machine learning the extracted content features and context features of the one or more documents at a file management platform of the computer network, each of the category classifications being associated with one or more confidentiality classifications.

Method, machine learning engines and file management platform systems for content and context aware data classification and security anomaly detection

Method, machine learning engines and file management platform systems for content and context aware data classification and security anomaly detection

Method, machine learning engines and file management platform systems for content and context aware data classification and security anomaly detection

Owner:DATHENA SCI PTE LTD

Method for establishing high-efficient semantic indexing for large-amount RDF (resource description framework) data

InactiveCN104216975AGuaranteed query efficiencyRich Offline ReasoningSemantic analysisSpecial data processing applicationsOpen sourceRDF

The invention discloses a method for establishing high-efficient semantic indexing for large-amount RDF (resource description framework) data. The method comprises the following steps: step 1, configuring an open-source distributive RDF database to be used as a duration database for storing the RDF data; step 2, distinguishing TBox data and ABox data in the RDF database; step 3, generating a child-parent semantic relation indexing among categories in TBox data; step 4, generating child-parent semantic relation indexing among attributes in TBox data; step 5, incorporating the generated semantic relations into the RDF data including the original TBox data and the ABox data to form novel RDF data; step 6, persisting the novel generated RDF data into the well-configured RDF database. For inquiring and reasoning a large amount of RDF data, the novel scheme for establishing the RDF data semantic relation indexing is finally provided, so that the inquiring efficiency is guaranteed, and meanwhile rich offline reasoning can be supported.

Method for establishing high-efficient semantic indexing for large-amount RDF (resource description framework) data

Method for establishing high-efficient semantic indexing for large-amount RDF (resource description framework) data

Method for establishing high-efficient semantic indexing for large-amount RDF (resource description framework) data

Owner:TIANJIN UNIV

Perturbing latent semantic indexing spaces

InactiveUS20060235661A1Exclude influenceEfficient mergeSemantic analysisComputation using non-denominational number representationTheoretical computer sciencePaper document

A text processing method is provided that includes the following steps. First, an abstract mathematical vector space is generated based on a collection of documents. Respective documents in the collection of documents have a representation in the abstract mathematical vector space and respective terms contained in the collection of documents have a representation in the abstract mathematical vector space. Then, the abstract mathematical vector space is perturbed to produce a perturbed abstract mathematical vector space that is stored in an electronic format accessible to a user. Perturbing the abstract mathematical vector space may include modifying the representation of a document with a newly computed representation for that document, or modifying the representation of a term with a newly computed representation for that term.

Perturbing latent semantic indexing spaces

Perturbing latent semantic indexing spaces

Perturbing latent semantic indexing spaces

Owner:RELATIVITY ODA LLC

Query generation and time difference features for supervised semantic indexing

ActiveUS20140122388A1Web data indexingDigital computer detailsAutomatic indexingDocumentation

Semantic indexing methods and systems are disclosed. One such method is directed to training a semantic indexing model by employing an expanded query. The query can be expanded by merging the query with documents that are relevant to the query for purposes of compensating for a lack of training data. In accordance with another exemplary aspect, time difference features can be incorporated into a semantic indexing model to account for changes in query distributions over time.

Query generation and time difference features for supervised semantic indexing

Query generation and time difference features for supervised semantic indexing

Query generation and time difference features for supervised semantic indexing

Owner:NEC CORP

Popular searches

Phrase Proximity measure Vector space model Word-sense disambiguation Data science Word meaning Data query Information retrieval Subject matter Scale space