Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

226 results about "N-gram" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. When the items are words, n-grams may also be called shingles.

Identification and rejection of meaningless input during natural language classification

ActiveUS7707027B2Digital data information retrievalNatural language data processingAlgorithmComputer science

A method for identifying data that is meaningless and generating a natural language statistical model which can reject meaningless input. The method can include identifying unigrams that are individually meaningless from a set of training data. At least a portion of the unigrams identified as being meaningless can be assigned to a first n-gram class. The method also can include identifying bigrams that are entirely composed of meaningless unigrams and determining whether the identified bigrams are individually meaningless. At least a portion of the bigrams identified as being individually meaningless can be assigned to the first n-gram class.

Identification and rejection of meaningless input during natural language classification

Identification and rejection of meaningless input during natural language classification

Identification and rejection of meaningless input during natural language classification

Owner:MICROSOFT TECH LICENSING LLC

Apparatus method and medium for tracing the origin of network transmissions using n-gram distribution of data

ActiveUS20050265331A1Unauthorized memory use protectionTime-division multiplexWeb transportReal-time computing

A method, apparatus, and medium are provided for tracing the origin of network transmissions. Connection records are maintained at computer system for storing source and destination addresses. The connection records also maintain a statistical distribution of data corresponding to the data payload being transmitted. The statistical distribution can be compared to that of the connection records in order to identify the sender. The location of the sender can subsequently be determined from the source address stored in the connection record. The process can be repeated multiple times until the location of the original sender has been traced.

Apparatus method and medium for tracing the origin of network transmissions using n-gram distribution of data

Apparatus method and medium for tracing the origin of network transmissions using n-gram distribution of data

Apparatus method and medium for tracing the origin of network transmissions using n-gram distribution of data

Owner:THE TRUSTEES OF COLUMBIA UNIV IN THE CITY OF NEW YORK

Explicit character filtering of ambiguous text entry

InactiveUS7712053B2Input/output for user-computer interactionElectronic switchingProgramming languageAlgorithm

The present invention relates to a method and apparatus for explicit filtering in ambiguous text entry. The invention provides embodiments including various explicit text entry methodologies, such as 2-key and long pressing. The invention also provides means for matching words in a database using build around methodology, stem locking methodology, word completion methodology, and n-gram searches.

Explicit character filtering of ambiguous text entry

Explicit character filtering of ambiguous text entry

Explicit character filtering of ambiguous text entry

Owner:TEGIC COMM

Method, apparatus and computer program product for providing flexible text based language identification

ActiveUS7552045B2Accurate analysisHighly configurable multilingualSpeech analysisNatural language data processingProcessing elementHuman language

An apparatus for providing flexible text based language identification includes an alphabet scoring element, an n-gram frequency element and a processing element. The alphabet scoring element may be configured to receive an entry in a computer readable text format and to calculate an alphabet score of the entry for each of a plurality of languages. The n-gram frequency element may be configured to calculate an n-gram frequency score of the entry for each of the plurality of languages. The processing element may be in communication with the n-gram frequency element and the alphabet scoring element. The processing element may also be configured to determine a language associated with the entry based on a combination of the alphabet score and the n-gram frequency score.

Method, apparatus and computer program product for providing flexible text based language identification

Method, apparatus and computer program product for providing flexible text based language identification

Method, apparatus and computer program product for providing flexible text based language identification

Owner:NOKIA TECH OY

Method of identifying the language of a textual passage using short word and/or n-gram comparisons

InactiveUS7359851B2Raise the possibilityReduce weightNatural language analysisSpecial data processing applicationsComputer scienceParagraph

A method and system identifying the language of a textual passage is disclosed. The method and system includes parsing the textual passage into n-grams and assigning an initial weight to each n-gram, and adjusting the weight initially assigned to a word or n-gram parsed from the textual passage. The initially assigned weight is adjusted in a manner proportionate to the inverse of the number of languages within which such words or n-grams appear. Reducing the weight assigned to such words or n-grams diminishes—without completely eliminating—their importance in comparison to other words or n-grams parsed from the same textual passage when determining the language of a passage. The method and system of the present invention appropriately weighs the short words or n-grams common to multiple languages without affecting the short words or n-grams that are uncommon to several languages.

Method of identifying the language of a textual passage using short word and/or n-gram comparisons

Method of identifying the language of a textual passage using short word and/or n-gram comparisons

Method of identifying the language of a textual passage using short word and/or n-gram comparisons

Owner:JUSTSYST EVANS RES

Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

InactiveUS20080228463A1Accuracy of recognitionImprove abilitiesNatural language translationSpeech recognitionCorpus restiformeWord model

Calculates a word n-gram probability with high accuracy in a situation where a first corpus), which is a relatively small corpus containing manually segmented word information, and a second corpus, which is a relatively large corpus, are given as a training corpus that is storage containing vast quantities of sample sentences. Vocabulary including contextual information is expanded from words occurring in first corpus of relatively small size to words occurring in second corpus of relatively large size by using a word n-gram probability estimated from an unknown word model and the raw corpus. The first corpus (word-segmented) is used for calculating n-grams and the probability that the word boundary between two adjacent characters will be the boundary of two words (segmentation probability). The second corpus (word-unsegmented), in which probabilistic word boundaries are assigned based on information in the first corpus (word-segmented), is used for calculating a word n-grams.

Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Word boundary probability estimating, probabilistic language model building, kana-kanji converting, and unknown word model building

Owner:INT BUSINESS MASCH CORP

Method of identifying the language of a textual passage using short word and/or n-gram comparisons

InactiveUS20050154578A1Raise the possibilityReduce weightNatural language analysisSpecial data processing applicationsComputer scienceParagraph

A method and system identifying the language of a textual passage is disclosed. The method and system includes parsing the textual passage into n-grams and assigning an initial weight to each n-gram, and adjusting the weight initially assigned to a word or n-gram parsed from the textual passage. The initially assigned weight is adjusted in a manner proportionate to the inverse of the number of languages within which such words or n-grams appear. Reducing the weight assigned to such words or n-grams diminishes—without completely eliminating—their importance in comparison to other words or n-grams parsed from the same textual passage when determining the language of a passage. The method and system of the present invention appropriately weighs the short words or n-grams common to multiple languages without affecting the short words or n-grams that are uncommon to several languages.

Method of identifying the language of a textual passage using short word and/or n-gram comparisons

Method of identifying the language of a textual passage using short word and/or n-gram comparisons

Method of identifying the language of a textual passage using short word and/or n-gram comparisons

Owner:JUSTSYST EVANS RES

Method for dynamic context scope selection in hybrid N-gram+LSA language modeling

InactiveUS7191118B2Semantic analysisSpeech recognitionDocumentationHuman language

A method and system for dynamic language modeling of a document are described. In one embodiment, a number of local probabilities of a current document are computed and a vector representation of the current document in a latent semantic analysis (LSA) space is determined. In addition, a number of global probabilities based upon the vector representation of the current document in an LSA space is computed. Further, the local probabilities and the global probabilities are combined to produce the language modeling.

Method for dynamic context scope selection in hybrid N-gram+LSA language modeling

Method for dynamic context scope selection in hybrid N-gram+LSA language modeling

Method for dynamic context scope selection in hybrid N-gram+LSA language modeling

Owner:APPLE INC

Systems and methods for an autonomous avatar driver

InactiveUS20080221892A1High reliabilityNatural language data processingAnimationSemantic lexiconCommon word

The autonomous avatar driver is useful in association with language sources. A sourcer may receive dialog from the language source. It may also, in some embodiments, receive external data from data sources. A segmentor may convert characters, represent particles and split dialog. A parser may then apply a link grammar, analyze grammatical mood, tag the dialog and prune dialog variants. A semantic engine may lookup token frames, generate semantic lexicons and semantic networks, and resolve ambiguous co-references. An analytics engine may filter common words from dialog, analyze N-grams, count lemmatized words, and analyze nodes. A pragmatics analyzer may resolve slang, generate knowledge templates, group proper nouns and estimate affect of dialog. A recommender may generate tag clouds, cluster the language sources into neighborhoods, recommend social networking to individuals and businesses, and generate contextual advertising. Lastly, a response generator may generate responses for the autonomous avatar using the analyzed dialog. The response generator may also incorporate the generated recommendations.

Owner:BOTANIC TECH INC

Sentiment Classification Based on Supervised Latent N-Gram Analysis

InactiveUS20120253792A1Digital data information retrievalSpecial data processing applicationsSemantic spaceEmotion classification

A method for sentiment classification of a text document using high-order n-grams utilizes a multilevel embedding strategy to project n-grams into a low-dimensional latent semantic space where the projection parameters are trained in a supervised fashion together with the sentiment classification task. Using, for example, a deep convolutional neural network, the semantic embedding of n-grams, the bag-of-occurrence representation of text from n-grams, and the classification function from each review to the sentiment class are learned jointly in one unified discriminative framework.

Sentiment Classification Based on Supervised Latent N-Gram Analysis

Sentiment Classification Based on Supervised Latent N-Gram Analysis

Sentiment Classification Based on Supervised Latent N-Gram Analysis

Owner:NEC LAB AMERICA

Presenting search results according to query domains

ActiveUS20100312782A1Convenient reviewOrganize effectivelyDigital data processing detailsSpeech recognitionData setDisplay device

A query may be applied against search engines that respectively return a set of search results relating to various items discovered in the searched data sets. However, presenting numerous and varied search results may be difficult on mobile devices with small displays and limited computational resources. Instead, search results may be associated with search domains representing various information types (e.g., contacts, public figures, places, projects, movies, music, and books) and presented by grouping search results with associated query domains, e.g., in a tabbed user interface. The query may be received through an input device associated with a particular input domain, and may be transitioned to the query domain of a particular search engine (e.g., by recognizing phonemes of a voice query using an acoustic model; matching phonemes with query terms according to a pronunciation model; and generating a recognition result according to a vocabulary of an n-gram language model.)

Presenting search results according to query domains

Presenting search results according to query domains

Presenting search results according to query domains

Owner:MICROSOFT TECH LICENSING LLC

Compression method, method for compressing entry word index data for a dictionary, and machine translation system

InactiveUS6502064B1Natural language translationCode conversionAlgorithmStatistical analysis

A n-gram statistical analysis is employed to acquire frequently appearing character strings of n characters or more, and individual character strings having n characters or more are replaced by character translation codes of 1 byte each. The correlation between the original character strings having n characters and the character translation codes is registered in a character translation code table. Assume that a character string of three characters, i.e., a character string of three bytes, "sta," is registered as 1-byte code "e5" and that a character string of four characters, i.e., a character string of four bytes, "tion," is registered as 1-byte code "f1." Then, the word "station," which consists of a character string of seven characters, i.e., seven bytes, is represented by the 2-byte code "e5 f1," so that this contributes to a compression of five bytes.

Compression method, method for compressing entry word index data for a dictionary, and machine translation system

Compression method, method for compressing entry word index data for a dictionary, and machine translation system

Compression method, method for compressing entry word index data for a dictionary, and machine translation system

Owner:IBM CORP

Unknown malcode detection using classifiers with optimal training sets

InactiveUS20090300765A1Improve detection accuracyReduce in quantityMemory loss protectionError detection/correctionData setAlgorithm

The present invention is directed to a method for detecting unknown malicious code, such as a virus, a worm, a Trojan Horse or any combination thereof. Accordingly, a Data Set is created, which is a collection of files that includes a first subset with malicious code and a second subset with benign code files and malicious and benign files are identified by an antivirus program. All files are parsed using n-gram moving windows of several lengths and the TF representation is computed for each n-gram in each file. An initial set of top features (e.g., up to 5500) of all n-grams IS selected, based on the DF measure and the number of the top features is reduced to comply with the computation resources required for classifier training, by using features selection methods. The optimal number of features is then determined based on the evaluation of the detection accuracy of several sets of reduced top features and different data sets with different distributions of benign and malicious files are prepared, based on the optimal number, which will be used as training and test sets. For each classifier, the detection accuracy is iteratively evaluated for all combinations of training and test sets distributions, while in each iteration, training a classifier using a specific distribution and testing the trained classifier on all distributions. The optimal distribution that results with the highest detection accuracy is selected for that classifier.

Unknown malcode detection using classifiers with optimal training sets

Unknown malcode detection using classifiers with optimal training sets

Unknown malcode detection using classifiers with optimal training sets

Owner:DEUTSCHE TELEKOM AG

System and method for detecting malicious executable code

ActiveUS20060037080A1Boosted SVMsFacilitate decision-makingMemory loss protectionUnauthorized memory use protectionSupport vector machineInductive method

A system and method for detecting malicious executable software code. Benign and malicious executables are gathered; and each are encoded as a training example using n-grams of byte codes as features. After selecting the most relevant n-grams for prediction, a plurality of inductive methods, including naive Bayes, decision trees, support vector machines, and boosting, are evaluated.

System and method for detecting malicious executable code

System and method for detecting malicious executable code

System and method for detecting malicious executable code

Owner:GEORGETOWN UNIV

Method of identifying script of line of text

InactiveUS7020338B1Character and pattern recognitionNatural language data processingPattern recognitionDocument preparation

A method of identifying the script of a line of text by first assigning a weight to each n-gram in a group of documents of known scripts, where each n-gram is a sequence of numbers representing k-mean cluster centroids of a known script to which character segments in the documents of known scripts most closely match. A line of text is identified, where the line of text is made up of pixels. The identified line of text is cropped so that only a percentage of the pixels remain. The cropped line is vertically and horizontally rescaled into gray-scale pixels. The vertical gray-scale pixels are replaced with the sequence number of a k-means cluster centroid of a known script to which it most closely matches. The n-grams of the number sequence that represents the line of text is scored against the n-gram weights of the documents of known text. The highest score of the line of text is identified and compared to the scores of the documents of known scripts. The script of the line of text is determined to be the script of the document against which the line of text scores the highest.

Method of identifying script of line of text

Method of identifying script of line of text

Method of identifying script of line of text

Owner:NATIONAL SECURITY AGENCY

Apparatus method and medium for detecting payload anomaly using n-gram distribution of normal data

ActiveUS20050281291A1Time-division multiplexData switching by path configurationData miningWeb transport

A method, apparatus and medium are provided for detecting anomalous payloads transmitted through a network. The system receives payloads within the network and determines a length for data contained in each payload. A statistical distribution is generated for data contained in each payload received within the network, and compared to a selected model distribution representative of normal payloads transmitted through the network. The model payload can be selected such that it has a predetermined length range that encompasses the length for data contained in the received payload. Anomalous payloads are then identified based on differences detected between the statistical distribution of received payloads and the model distribution. The system can also provide for automatic training and incremental updating of models.

Apparatus method and medium for detecting payload anomaly using n-gram distribution of normal data

Apparatus method and medium for detecting payload anomaly using n-gram distribution of normal data

Apparatus method and medium for detecting payload anomaly using n-gram distribution of normal data

Owner:THE TRUSTEES OF COLUMBIA UNIV IN THE CITY OF NEW YORK

Methods and systems for implementing approximate string matching within a database

ActiveUS20090171955A1Digital data information retrievalDigital data processing detailsTheoretical computer scienceDatabase

A computer-based method for character string matching of a candidate character string with a plurality of character string records stored in a database is described. The method includes a) identifying a set of reference character strings in the database, the reference character strings identified utilizing an optimization search for a set of dissimilar character strings, b) generating an n-gram representation for one of the reference character strings in the set of reference character strings, c) generating an n-gram representation for the candidate character string, d) determining a similarity between the n-gram representations, e) repeating steps b) and d) for the remaining reference character strings in the set of identified reference character strings, and f) indexing the candidate character string within the database based on the determined similarities between the n-gram representation of the candidate character string and the reference character strings in the identified set.

Methods and systems for implementing approximate string matching within a database

Methods and systems for implementing approximate string matching within a database

Methods and systems for implementing approximate string matching within a database

Owner:MASTERCARD INT INC

Systems and methods for interactive topic-based text summarization

InactiveUS7451395B2Natural language data processingSpecial data processing applicationsNoun phraseInteractive displays

Techniques for determining interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and / or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text provides contextualized access to an interactive topic-based text summary and to an original text.

Systems and methods for interactive topic-based text summarization

Systems and methods for interactive topic-based text summarization

Systems and methods for interactive topic-based text summarization

Owner:PALO ALTO RES CENT INC

Systems and methods for interactive topic-based text summarization

InactiveUS20040122657A1Natural language data processingSpecial data processing applicationsNoun phraseInteractive displays

Techniques for determining interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and / or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text provides contextualized access to an interactive topic-based text summary and to an original text.

Systems and methods for interactive topic-based text summarization

Systems and methods for interactive topic-based text summarization

Systems and methods for interactive topic-based text summarization

Owner:PALO ALTO RES CENT INC

Efficient language identification

ActiveUS20060184357A1Improve performanceReduce in quantityAnti-theft devicesCharacter and pattern recognitionPresent methodMore language

A system and methods of language identification of natural language text are presented. The system includes stored expected character counts and variances for a list of characters found in a natural language. Expected character counts and variances are stored for multiple languages to be considered during language identification. At run-time, one or more languages are identified for a text sample based on comparing actual and expected character counts. The present methods can be combined with upstream analyzing of Unicode ranges for characters in the text sample to limit the number of languages considered. Further, n-gram methods can be used in downstream processing to select the most probable language from among the languages identified by the present system and methods.

Efficient language identification

Efficient language identification

Efficient language identification

Owner:MICROSOFT TECH LICENSING LLC

Systems and methods for alphanumeric navigation and input

InactiveUS20100293497A1Add featureEasy accessTelevision system detailsCathode-ray tube indicatorsText entryVisual perception

Systems and methods for simplifying text entry are provided. A visual keypad may include a plurality of user-selectable buttons corresponding to at least some of the buttons of the alphabet. The layout of the visual keypad may be determined based on an n-gram table. The layout of the visual keypad may be rearranged based at least in part on the most likely next character in response to receiving a user selection of a button on the visual keypad.

Systems and methods for alphanumeric navigation and input

Systems and methods for alphanumeric navigation and input

Systems and methods for alphanumeric navigation and input

Owner:UNITED VIDEO PROPERTIES

Systems and methods for displaying interactive topic-based text summaries

ActiveUS7117437B2Biological modelsSpeech recognitionNoun phraseInteractive displays

Techniques for displaying interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and / or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text to provide contextualized access to an interactive topic-based text summary and to an original text.

Systems and methods for displaying interactive topic-based text summaries

Systems and methods for displaying interactive topic-based text summaries

Systems and methods for displaying interactive topic-based text summaries

Owner:XEROX CORP

Language identification from short strings

ActiveUS10127220B2Natural language translationSemantic analysisShort stringUser input

Systems and processes for language identification from short strings are provided. In accordance with one example, a method includes, at a first electronic device with one or more processors and memory, receiving user input including an n-gram and determining a similarity between a representation of the n-gram and a representation of a first language. The representation of the first language is based on an occurrence of each of a plurality of n-grams in the first language and an occurrence of each of the plurality of n-grams in a second language. The method further includes determining whether the similarity between the representation of the n-gram and the representation of the first language satisfies a threshold.

Language identification from short strings

Language identification from short strings

Language identification from short strings

Owner:APPLE INC

Method and system for organizing information

InactiveUS20090248669A1Digital data information retrievalDigital data processing detailsData setData source

A system and method to process data having a module stored on the server computer system for receiving a query over a network from a client computer system. A search engine utilizes the query to extract a search result from a data source. A query decomposition module decomposes the query into at least one n-gram which is a subset of the query. A processing module processes the at least one n-gram to determine at least one related search suggestion. A merging module merges the at least one related search suggestion into a ranked output data set. A transmission module transmits the search result and the at least one related search suggestion from the server computer system to the client computer system.

Method and system for organizing information

Method and system for organizing information

Method and system for organizing information

Owner:IAC SEARCH & MEDIA

Chinese text automatic correction method

InactiveCN105279149AFast troubleshootingImprove error correction efficiencySpecial data processing applicationsError checkAutocorrection

The invention discloses a Chinese text automatic correction method. The method comprises the following steps of: a) inputting a to-be-corrected Chinese text, and performing word segmentation preprocessing on the Chinese text sentence by sentence; b) searching for one-character words, two-character words or disperse strings of three or more than three characters occurring in the text subjected to word segmentation sentence by sentence; c) performing continuous determination on the disperse strings occurring in the text subjected to word segmentation by adopting an N-gram model, and checking text word level errors for each single sentence in combination with a word forming probability of separate characters; and d) constructing an error correction knowledge base to generate an error correction candidate text. According to the Chinese text automatic correction method provided by the invention, the one-character words, two-character words or disperse strings of three or more than three characters occurring in the text subjected to word segmentation are searched for sentence by sentence, the disperse strings occurring in the text subjected to word segmentation are subjected to continuous determination by adopting the N-gram model to determine identification errors, and the error correction knowledge base is constructed to generate the error correction candidate text, so that error checking and correcting processes are combined very well, and the method has the characteristics of high error checking speed and high error correcting efficiency.

Chinese text automatic correction method

Chinese text automatic correction method

Chinese text automatic correction method

Owner:SHANGHAI INST OF TECH

Automatic Evaluation of Spoken Fluency

ActiveUS20110040554A1Speech recognitionSpecial data processing applicationsSpoken languageFormant

A procedure to automatically evaluate the spoken fluency of a speaker by prompting the speaker to talk on a given topic, recording the speaker's speech to get a recorded sample of speech, and then analyzing the patterns of disfluencies in the speech to compute a numerical score to quantify the spoken fluency skills of the speakers. The numerical fluency score accounts for various prosodic and lexical features, including formant-based filled-pause detection, closely-occurring exact and inexact repeat N-grams, normalized average distance between consecutive occurrences of N-grams. The lexical features and prosodic features are combined to classify the speaker with a C-class classification and develop a rating for the speaker.

Automatic Evaluation of Spoken Fluency

Automatic Evaluation of Spoken Fluency

Automatic Evaluation of Spoken Fluency

Owner:NUANCE COMM INC

Method and apparatus for programmatically generating audio file playlists

ActiveUS7678984B1Addressing Insufficient CoverageExcessive diversityElectrophonic musical instrumentsElectronic editing digitised analogue information signalsProbabilistic methodAudio frequency

Method and apparatus for programmatically generating interesting audio file playlists. A playlist generation mechanism may use an N-gram model of audio file ordering patterns found in a collection of human-generated playlists to automatically generate new playlists. Given play histories indicating one or more played audio files as input, statistical methods may be used to look for sequences of audio files that occur a statistically significant number of times in the N-gram model for inclusion in new, interesting playlists that incorporate the human element found in the collection of playlists. In some embodiments, one more backoff probability methods may be used to provide additional candidate audio files for playlists if there is insufficient coverage for an audio file in the N-gram model. In one embodiment, a class-based statistical model incorporating higher-level statistics for the audio files may be used to weight selection of audio file transitions from the N-gram model.

Method and apparatus for programmatically generating audio file playlists

Method and apparatus for programmatically generating audio file playlists

Method and apparatus for programmatically generating audio file playlists

Owner:ORACLE INT CORP

Domain-specific sentiment classification

ActiveUS7987188B2Digital data information retrievalDigital data processing detailsSentiment scoreA domain

A domain-specific sentiment classifier that can be used to score the polarity and magnitude of sentiment expressed by domain-specific documents is created. A domain-independent sentiment lexicon is established and a classifier uses the lexicon to score sentiment of domain-specific documents. Sets of high-sentiment documents having positive and negative polarities are identified. The n-grams within the high-sentiment documents are filtered to remove extremely common n-grams. The filtered n-grams are saved as a domain-specific sentiment lexicon and are used as features in a model. The model is trained using a set of training documents which may be manually or automatically labeled as to their overall sentiment to produce sentiment scores for the n-grams in the domain-specific sentiment lexicon. This lexicon is used by the domain-specific sentiment classifier.

Domain-specific sentiment classification

Domain-specific sentiment classification

Domain-specific sentiment classification

Owner:GOOGLE LLC

Character string updated degree evaluation program

InactiveUS20090226098A1Digital data information retrievalCharacter and pattern recognitionTheoretical computer scienceEdit distance

There is provided a character string updated degree evaluation program that enables quantitative grasping of an amount of intellectual work through editing and updating of character strings. A text subjected to comparison is divided into common part character strings each having a length greater than or equal to a threshold value, and non-common part character strings. A number of edited points from the original text and a context edit distance are calculated based on the rate of the common part character strings and the occurrence pattern thereof. A number of edited point is acquired from a number of elements contained in a common part character string set, and a context edit distance is acquired from a change in an order of occurrence of the common part character strings. Calculation of a new creation percentage and analysis by an N-gram are performed on the non-common part character string. The new creation percentage is acquired from the total length of the elements contained in a non-common part character string set, and a new creation novelty degree is acquired from a non-partial matching rate between a non-common part character string set and an element contained in the non-common part character string set. Calculations for the common part character string set and for the non-common part character string set are united, thereby calculating a text updated degree.

Character string updated degree evaluation program

Character string updated degree evaluation program

Character string updated degree evaluation program

Owner:NAT UNIV CORP NAGAOKA UNIV TECH

System and method for context-based spontaneous speech recognition

InactiveUS20030023437A1Speech recognitionCollocationSpeech identification

A system and method for processing human language input uses collocation information for the language that is not limited to N-gram information for N no greater than a predetermined value. The input is preferably speech input. The system and method preferably recognize at least a portion of the input based on the collocation information.

System and method for context-based spontaneous speech recognition

System and method for context-based spontaneous speech recognition

System and method for context-based spontaneous speech recognition

Owner:NUSUARA TECH

Popular searches

Bigram Statistical model Natural language N-gram Source address Payload Data library Data science Language identification Computer program