Patents
Literature
Patsnap Copilot is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Patsnap Copilot

69 results about "Content word" patented technology

In linguistics, content words are words that name objects of reality and their qualities. They signify actual living things (dog, cat, etc.), family members (mother, father, sister, etc.), natural phenomena (snow, Sun, etc.) common actions (do, make, come, eat, etc.), characteristics (young, cold, dark, etc.), etc. They consist mostly of nouns, lexical verbs and adjectives, but certain adverbs can also be content words. They contrast with function words, which are words that have very little substantive meaning and primarily denote grammatical relationships between content words, such as prepositions (in, out, under, etc.), pronouns (I, you, he, who, etc.), conjunctions (and, but, till, as, etc.), etc.

Document summarizer for word processors

An author-oriented document summarizer for a word processor is described. The document summarizer performs a statistical analysis to generate a list of ranked sentences for consideration in the summary. The summarizer counts how frequently content words appear in a document and produces a table correlating the content words with their corresponding frequency counts. Phrase compression techniques are used to produce more accurate counts of repeatedly used phrases. A sentence score for each sentence is derived by summing the frequency counts of the content words in a sentence and dividing that tally by the number of the content words in the sentence. The sentences are then ranked in order of their sentence scores. Concurrent with the statistical analysis, during the same pass through the document the summarizer performs a cue-phrase analysis to weed out sentences with words or phrases that have been pre-identified as potential problem phrases. The cue-phrase analysis compares sentence phrases with a pre-compiled list of words and phrases and sets conditions on whether the sentences containing them can be used in the summary. Following the cue-phrase analysis, the summarizer creates a summary containing the higher ranked sentences. The summary may also include a conditioned sentence if the conditions established for inclusion of the sentence have been satisfied. The summarizer then inserts the sentence at the beginning of the document before the start of the text.
Owner:MICROSOFT TECH LICENSING LLC

Automated system and method for generating reasons that a court case is cited

A computer-automated system and method identify text in a first “citing” court case, near a “citing instance” (in which a second “cited” court case is cited), that indicates the reason(s) for citing (RFC). The automated method of designating text, taken from a set of citing documents, as reasons for citing (RFC) that are associated with respective citing instances of a cited document, has steps including: obtaining contexts of the citing instances in the respective citing documents (each context including text that includes the citing instance and text that is near the citing instance), analyzing the content of the contexts, and selecting (from the citing instances' context) text that constitutes the RFC, based on the analyzed content of the contexts. A related computer-automated system and method selects content words that are highly related to the reasons a particular document is cited, and gives them weights that indicate their relative relevance. Another related computer-automated system and method forms lists of morphological forms of words. Still another related computer-automated system and method scores sentences to show their relevance to the reasons a document is cited. Also, another related computer-automated system and method generates lists of content words. In a preferred embodiment, the systems and methods are applied to legal (especially case law) documents and legal (especially case law) citations.
Owner:RELX INC

Automatic generation of statistical language models for interactive voice response applications

A Statistical Language Model (SLM) that can be used in an ASR for Interactive Voice Response (IVR) systems in general and Natural Language Speech Applications (NLSAs) in particular can be created by first manually producing a brief description in text for each task that can be performed in an NLSA. These brief descriptions are then analyzed, in one embodiment, to generate spontaneous speech utterances based pre-filler patterns and a skeletal set of content words. The pre-filler patterns are in turn used with Part-of-Speech (POS) tagged conversations from a spontaneous speech corpus to generate a set of pre-filler phrases. The skeletal set of content words is used with an electronic lexico-semantic database and with a thesaurus-based content word extraction process to generate a more extensive list of content words. The pre-filler phrases and content words set, thus generated, are combined into utterances using a lexico-semantic resource based process. In one embodiment, a lexico-semantic statistical validation process is used to correct and / or add the automatically generated utterances to the database of expected utterances. The system requires a minimum amount of human intervention and no prior knowledge regarding the expected user utterances, and the WWW is used to validate the word models. The system requires a minimum amount of human intervention and no prior knowledge regarding the expected user utterances in response to a particular prompt.
Owner:LYMBA CORP

Illegal online commodity detection method

The invention relates to an illegal online commodity detection method, comprising the following steps: step (1) of taking information of a page on which to-be-detected commodities exist through a web crawler; step (2) of finding a least-depth node comprising a plurality of information blocks with similar structure in a webpage as a critical node through analyzing a DOM (document object model) tree structure of an electronic commerce website on which to-be-detected commodities exist, forming an associated information point to extract and establishing a template, and extracting commodity attribute information data from webpage information acquired by the crawler; step (3) of establishing a semantic dictionary, performing word segmentation to the extracted commodity attribute information through a method based on character matching; step (4) of manually establishing an illegal semantic library, recognizing and judging words in the illegal semantic library and content words field related to the commodity attribute information processed through word segmentation according to a function Illegal List, and judging an illegal class of the illegal online commodity according to a function return result. The method provided by the invention is characterized by simple calculation and good timeliness, and is suitable for changeful page modes.
Owner:ZHEJIANG PANSHI INFORMATION TECH

Theme word extraction method, and method and device for obtaining related digital resource by using same

The invention provides a theme word extraction method, and a method and a device for obtaining related digital resources by using the same. The theme word extraction method comprises: firstly, performing word segmentation on a text of digital resource, and then obtaining content words according to a word segmentation result; aimed at each theme, obtaining probability distribution of the content words, the probability distribution comprising the content words and corresponding weight thereof; obtaining each meaning of the content words, combining the content words in the same meaning and combining the corresponding weight; and according to the combined content words and the weight thereof, determining the theme words. The scheme views from an angle of the meaning of a word, and the words in the same meaning are combined, so as to prevent interference of polysemic words and synonyms on extraction of the theme words in the prior art, and improve accuracy of extraction of the theme words. The method eliminates dependence on selection of feature words and identity of named entities in the prior art, weakens interference of polysemic words and synonyms on extraction of the theme words, and a user oriented customized special subject organization and generation thereof are realized.
Owner:NEW FOUNDER HLDG DEV LLC +2

Method and system for converting PowerPoint file into word file

The invention relates to the field of processing techniques of computer files, and particularly relates to a method and a system for converting a PowerPoint file into a word file. The method is characterized by comprising the following steps of using a file name of the PowerPoint file as a primary title of the word file; for a first shape of each page, if a character in the shape is the same with a character of a first shape of a previous page, omitting the character in the shape; otherwise, using the file name of a text in the first shape in this shape as a secondary subtitle of the word file; for each page in the ppt (PowerPoint) file, firstly reading the shape of each area of a current page; judging the attribute type of the area, so as to distinguish that the area is content of the text, a form, a picture, an embedded object and the like; converting the area according to the different contents. The invention also provides the system for converting the PowerPoint file into the word file. The system for converting the PowerPoint file into the word file comprises a content reading module, a content recognizing module, a content classifying and processing module and a classified and processed content word-writing-in module.
Owner:TIANJIN CHENGJIAN UNIV +2

A word sense disambiguation method and system based on graph model

The invention discloses a word sense disambiguation method and system based on a graph model, and belongs to the field of natural language processing technology. The technical problem to be solved bythe present invention is how to combine multiple Chinese and English resources, complement each other's advantages, realize full exploitation of disambiguation knowledge in resources, and improve wordsense disambiguation performance.The technical scheme adopted is as follows: 1, a word sense disambiguation method based on graph model, comprising the following steps: S1, extracting contextual knowledge: carrying out part-of-speech tagging on ambiguous sentences, extracting substantive words as contextual knowledge, wherein the substantive words refer to nouns, verbs, adjectives and adverbs; S2, similarity calculation: performing similarity calculation based on English, similarity calculation based on word vector and similarity calculation based on HowNet; 3, constructing a disambiguation graph; S4, performing the correct choice of word meaning. 2, A word sense disambiguation system based on graph model, which comprises a context knowledge extraction unit, a similarity calculation unit,a disambiguation graph construction unit and a word sense correct selection unit.
Owner:ZAOZHUANG UNIV

Context similarity calculation-based word sense disambiguation method

The invention relates to a context similarity calculation-based word sense disambiguation method. The method comprises the steps of processing training corpora, and training a model by using a part-of-speech tagging version of ukWaC; screening parts of speech, and only reserving notional words including nouns, adjectives, adverbs and verbs; training a bidirectional LSTM model by using the corporasubjected to part-of-speech screening; inputting example sentences of to-be-disambiguated words to the bidirectional LSTM model to obtain context vectors; inputting contexts of the to-be-disambiguatedwords to the bidirectional LSTM model to obtain context vectors of the to-be-disambiguated words; and calculating cosine similarity for the context vectors of the to-be-disambiguated words and the context vectors of the example sentences, and further selecting semanteme of the to-be-disambiguated words by utilizing a k-neighbor method according to an obtained similarity result. According to the method, the semanteme is better modeled; the words and the parts of speech are combined by using an underline behind the words directly; obtained word vectors well distinguish different parts of speechof the same word; and the disambiguation accuracy is improved by 0.5% on an experimental basis of baselines.
Owner:SHENYANG AEROSPACE UNIVERSITY

Intelligent contract classification method based on keyword feature extraction and attention

The invention provides an intelligent contract classification method based on keyword feature extraction and attention, and the method comprises the steps: processing codes of intelligent contracts through a long-term and short-term memory network, carrying out the feature extraction of corresponding keywords, and combining with an attention mechanism, thereby achieving a purpose of classifying the intelligent contracts; training the intelligent contract into a content word vector by using a word-to-word vector model Word2Vec, and converting the keyword into a serialized vector by using a vectorized text tool Tokenizer; and putting the serialized vector into a long-term and short-term memory network, and connecting the final hidden state vector with each word vector of the intelligent contract; after the connected vectors are subjected to one-layer convolution operation and one-layer pooling operation, putting the operated vectors into a long-short-term memory neural network, and multiplying the final hidden state vector by a vector generated through attention; and putting the obtained sentence representation into a long-term and short-term memory neural network, and finally classifying the intelligent contracts by using a softmax classifier; and finally, evaluating the model on the data set of the Ethereum website by combining the DApps decentralization application program, and proving the effectiveness of the model by an experimental result. The training accuracy reaches 89.1%.
Owner:SHANDONG UNIV OF SCI & TECH

Automated system and method for generating reasons that a court case is cited

A computer-automated system and method identify text in a first “citing” court case, near a “citing instance” (in which a second “cited” court case is cited), that indicates the reason(s) for citing (RFC). The automated method of designating text, taken from a set of citing documents, as reasons for citing (RFC) that are associated with respective citing instances of a cited document, has steps including: obtaining contexts of the citing instances in the respective citing documents (each context including text that includes the citing instance and text that is near the citing instance), analyzing the content of the contexts, and selecting (from the citing instances' context) text that constitutes the RFC, based on the analyzed content of the contexts. A related computer-automated system and method selects content words that are highly related to the reasons a particular document is cited, and gives them weights that indicate their relative relevance. Another related computer-automated system and method forms lists of morphological forms of words. Still another related computer-automated system and method scores sentences to show their relevance to the reasons a document is cited. Also, another related computer-automated system and method generates lists of content words. In a preferred embodiment, the systems and methods are applied to legal (especially case law) documents and legal (especially case law) citations.
Owner:RELX INC
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products