Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

41 results about "Multi-document summarization" patented technology

Multi-document summarization is an automatic procedure aimed at extraction of information from multiple texts written about the same topic. The resulting summary report allows individual users, such as professional information consumers, to quickly familiarize themselves with information contained in a large cluster of documents. In such a way, multi-document summarization systems are complementing the news aggregators performing the next step down the road of coping with information overload.

System and method for document collection, grouping and summarization

A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event. A dissimilarity engine for multiple document summary generation is provided which generates summaries of document clusters having documents with varying degrees of relatedness. A user interface is provided to display categories, cluster titles, summaries, related images.
Owner:THE TRUSTEES OF COLUMBIA UNIV IN THE CITY OF NEW YORK

Method and system for simultaneously abstracting document summarization and key words

The invention relates to a method which extracts the abstracts and key words of a file at the same time, belonging to language words processing technique. The existing method takes the extraction of abstracts of the file and the extraction of the key words of the file as two irrelative tasks and respectively processes the two tasks which have the same nature; the method can utilize the same nature of the extraction and completes the extractions of the abstracts and key words at the same time. The method utilizes a figure learning model and comprehensively utilizes the relationships between sentences in the file, between the sentence and the words in the file, and between the words in the file, exactly evaluates the importance of the sentences and the words, and finally adopts the important sentences and words as the abstracts and key words of the file. The method can extract the abstracts and key words of the file at the same time on the one hand, and can gain a better effect of the extraction of abstracts and key words on the other hand; the method can be widely applied to the fields such as text information processing and digging and the like.
Owner:PEKING UNIV +2

Document summarization

Systems, methods, and other embodiments associated with automatically summarizing a document are described. One method embodiment includes computing term scores for members of a set of terms in a document to be summarized and computing sentence scores for sentences in a set of sentences in the document. The method embodiment also includes computing a set of entries for a term-sentence matrix that relates terms to sentences. The method embodiment also includes computing a dominant topic for the document and simultaneously ranking the set of terms and the set of sentences based on the dominant topic. The method embodiment provides a summarization item(s) selected from the set of terms and / or the set of sentences.
Owner:ORACLE INT CORP

Autoabstract method for multi-document

The invention discloses a method which utilizes a graph partition method to automatically extract a multi-document summarization, and the method comprises the following steps that: the sentence boundary dividing is carried out, and the document is expressed by the divided sentences; the sentences are expressed into vectors, the similarities among each two sentences are calculated to compose a sentence incidence matrix, which is reduced according to the appointed threshold value, at the same time, the normalized treatment is carried out; the crawling of the implied logical topic of a topic is introduced into the multi-document summarization, and a document set is divided into different implied sub-topics according to the topic, thereby the summarization task is changed into the selection and the extraction processes to the sub-topics; by applying the graph partition method, the importance degree of the sub-topic of the sentences is ensured from the global characteristics, and the low redundancy of the contents among the different sub-tops is ensured from the local characteristics, thereby effectively improving the quality of the summarization.
Owner:INST OF COMPUTING TECH CHINESE ACAD OF SCI

Multi-document summarization method based on text segmentation

The invention belongs to the technical field of multi-document summarization and provides a multi-document summarization method based on text segmentation, which comprises the following steps of: using HowNet to obtain a concept, building a concept vector space model, conducting text segmentation by adopting an improved DotPlotting model and a sentence concept vector space, calculating sentence weight by using the built concept vector space model, generating a summary according to the sentence weight, the text segmentation and the similarity situation, and evaluating the generated summary by using the ROUGE-N evaluation method and using F_Score as an evaluation index. According to the result, the multi-document summarization by using a text segmentation technique is effective, relevant documents provided by users can be gathered to form a summary by adopting the multi-document summarization method, the summary is displayed to the users in a proper way, the information acquisition efficiency is greatly improved, the practicability is high and the popularization and application values are greater.
Owner:广西超宏科技有限公司

Method and device for generating document summarization

ActiveCN104503958AReduce build timeImprove the efficiency of generating summariesSpecial data processing applicationsDocument summarizationGeneration process
The invention provides a method and a device for generating a document summarization. The method comprises the following steps: obtaining a document, processing the document by utilizing preset characteristics to obtain a summarization candidate sentence, wherein the preset characteristics comprise keywords, numbers and one or a plurality of sentences and subtitles which are far away from a title contained in the document for a preset range; carrying out compression processing to the summarization candidate sentence; and carrying out postprocessing on the summarization candidate sentence subjected to the compression processing to generate the document summarization. The summarization generated by the method and the device, which are disclosed by the embodiment of the invention, for generating a document summarization is concise and accurate, no redundant information exists in the summarization, a generation process is simple and does not need artificial participation, time for generating the document summarization can be greatly shortened, and efficiency on generating the document summarization is improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Automatic multi-document abstract extraction method and automatic multi-document abstract extraction system based on sentence vectors

The invention discloses an automatic multi-document abstract extraction method and an automatic multi-document abstract extraction system based on sentence vectors. The automatic multi-document abstract extraction method includes S1, preprocessing document collections; S2, generating the sentence vectors through doc2vec model training; S3, cluttering the sentence vectors into sub-theme documents;S4, creating a sentence relation graph model in each sub-theme document; S5, calculating sentence weights; S6, extracting and sequencing sentences to form abstracts. The automatic multi-document abstract extraction method and the automatic multi-document abstract extraction system have the advantages that all the sentences in the target document collections are expressed by the vectors through thelarge-corpus-set training doc2vec model; sub themes are acquired through spectral clustering, one sentence is extracted from each sub theme, and accordingly, sentence redundancy is avoided; the sentences are sequenced according to positions in original documents to form the abstracts, and coherence of the abstract sentences is improved.
Owner:SHANDONG INST OF BUSINESS & TECH

System, method, and user interface for a search engine based on multi-document summarization

A method for searching multiple documents on a computer system includes steps for sending a query to a system core where the query is passed to a search component for searching the documents. The system core in turn receives results from the search component indicating related documents to the query and passes to a summarization component a specified number of the results. The summarization component processes related documents corresponding to the specified number of results to produce a multi-document summary. The system core receives the summary from the summarization component. The multi-document summary is received from the system core.
Owner:SOUBBOTIN DMITRI

Method and device generating multi-file summary

The invention embodiment discloses a method and device generating a multi-file summary, so the generated multi-file summary can have high coverage rate for multi-file important information, and redundancy can be reduced; the method comprises the following steps: destructing a multi-file sentence set into a phrase pool, and obtaining feathers and relations of each phrase in the phrase pool; selecting a phrase set, satisfying a preset constraint condition, from the phrase pool as a summary phrase set according to the features and relations; combining the selected summary phrase set into summary sentences according to a preset combination mode, thus forming the multi-file summary.
Owner:HUAWEI TECH CO LTD

System and method for document collection, grouping and summarization

A system for generating a summary of a plurality of documents and presenting the summary information to a user is provided which includes a computer readable document collection containing a plurality of related documents stored in electronic form. Documents can be pre-processed to group documents into document clusters. The document clusters can also be assigned to predetermined document categories for presentation to a user. A number of multiple document summarization engines are provided which generate summaries for specific classes of multiple documents clusters. A summarizer router is employed to determining a relationship of the documents in a cluster and select one of the document summarization engines for use in generating a summary of the cluster. A single event engine is provided to generate summaries of documents which are closely related temporally and to a specific event. A dissimilarity engine for multiple document summary generation is provided which generates summaries of document clusters having documents with varying degrees of relatedness. A user interface is provided to display categories, cluster titles, summaries, related images.
Owner:THE TRUSTEES OF COLUMBIA UNIV IN THE CITY OF NEW YORK

Document summarization based on topicality and specificity

Topicality scores are determined for a number of phrasal expressions in documents. Phrasal expressions may be noun phrases, with or without corresponding prepositional phrases, subject-verb pairs, and verb-object pairs. The documents describe some topic or multiple topics. Techniques can be used to determined how the phrasal expression compares with the topic or topics being described in the documents. Specificities are determined for the phrasal expressions. Techniques may be used to determine whether phrasal expressions are more or less specific than other phrasal expressions. An order is determined for the phrasal expressions by using the topicality scores and the specificities. The order may be represented as a phrasal expression tree, for example. The phrasal expression tree may be displayed to a user, and the user can navigate through the phrasal expression tree, and therefore through the one or more documents.
Owner:MICROSOFT TECH LICENSING LLC

Method and device for generating multi-document summarization

ActiveCN108733682AImprove performanceImprove measurement capabilitiesSpecial data processing applicationsSemantic vectorVerb phrase
The embodiment of the invention discloses a method and a device for generating a multi-document summarization, relates to the field of data processing and solves the problem of poor performance of a summarization generated by an existing automatic multi-document summarization technology. A specific scheme of the method comprises the steps of dividing multiple documents into n sentences; generatingan input word bag vector; performing unsupervised training on each sentence represented by the input word bag vector to obtain an encoding hidden layer vector of each sentence and a potential semantic vector of each sentence; collecting m potential semantic vectors; obtaining m decoding hidden layer vectors and m output word bag vectors according to the m potential semantic vectors; updating them decoding hidden layer vectors and the m output word bag vectors; estimating an importance degree of each sentence; acquiring the importance degree and a redundancy degree of a verb phrase of each sentence and the importance degree and the redundancy degree of a noun phrase of each sentence; and generating the summarization of multiple documents according to the importance degree and the redundancy degree of all noun phrases and the importance degree and the redundancy degree of all verb phrases. The embodiment of the invention is used for a process for generating the multi-document summarization.
Owner:HUAWEI TECH CO LTD

Multiple file summarization method facing subject or inquiry based on cluster arrangement

To overcome the defect in prior art, the related method considers fully the relation between sentences and the relation between sentence and user query to generate the abstract both with main file information and topic explanation or query answer, and applies difference penalty algorithm to ensure the novelty of abstract. This invention can meet individual request.
Owner:PEKING UNIV

Multi-document abstract sentence generating method

InactiveCN104778157ATaking into account the amount of informationTake into account the lengthSpecial data processing applicationsFeature vectorNatural language processing
The invention discloses a multi-document abstract sentence generating method, which comprises the following steps that S1, a sentence feature vector space is used as input, sentences are subjected to clustering analysis according to the sentence feature vector similarity, and each cluster obtained through calculation is recorded as a sub theme; S2, the important degree of each sub theme is determined according to the document set covering degree of each sub theme and the number of contained sentences, and in addition, the sub themes are sequenced according to the important degree; S3, the sentences under each theme are graded and sequenced; S4, the sentences with the highest important degree grades in each sub theme are extracted out to be used as abstract sentences, demonstrative pronouns used as subjects in the sentences are replaced, in addition, the abstract sentences are sequenced according to the impart degree degrades of the sub themes of the sentences, and finally, abstracts are generated and output.
Owner:SOUTH CHINA UNIV OF TECH +2

Multi-document abstract generation method and device, and terminal

Embodiments of the invention provide a multi-document abstract generation method and device, and a terminal, relate to the field of data processing, and aim to solve the problem that redundant information in generated document abstracts is relatively numerous in the prior art. The method comprises the steps of acquiring a candidate sentence set, wherein the candidate sentence set comprises candidate sentences comprised in each of a plurality of candidate documents about the same event; training each candidate sentence in the candidate sentence set by using a cascade attention mechanism in a preset network model and an unsupervised learning model, and obtaining the importance of each candidate sentence, wherein the importance of one candidate sentence corresponds to a module of a row vectorin a matrix of the cascade attention mechanism output by the preset network model; according to the importance of each candidate sentence, selecting phrases meeting preset conditions from the candidate sentence set to serve as abstract phrase sets; and combining the abstract phrase sets into abstract sentences in a preset combination mode, and obtaining abstracts of the candidate documents.
Owner:XFUSION DIGITAL TECH CO LTD

System, method, and user interface for a search engine based on multi-document summarization

A method for searching multiple documents on a computer system includes steps for sending a query to a system core where the query is passed to a search component for searching the documents. The system core in turn receives results from the search component indicating related documents to the query and passes to a summarization component a specified number of the results. The summarization component processes related documents corresponding to the specified number of results to produce a multi-document summary. The system core receives the summary from the summarization component. The multi-document summary is received from the system core.
Owner:SOUBBOTIN DMITRI

Method for modeling dynamic multi-document abstracts

The invention relates to a method for modeling dynamic multi-document abstracts, and aims to solve the problem that the contents, and the distribution and association conditions of various information sides under current subjects are difficult to globally master, so that a large number of abstract fragments come from the same subject, and comprehensiveness of abstract is seriously influenced in the traditional multi-document abstract method. The method specifically comprises the followings steps of: preprocessing a document collection; building a characteristic extracting module; building an information filtering module; building a sentence weighting module; building an abstract generation module to generate a best abstract; and outputting the best abstract using by using an output module to finish the modeling of dynamic multi-document abstract. By the method, the dynamically evolved abstract has relatively high information novelty, and evolution of history information, so that the performance of the dynamic abstract is improved. The abstract acquired by the method is more comprehensive. And the method is applied to an abstract extracting field.
Owner:HARBIN INST OF TECH

Method and device for generating multi-document summary

The invention discloses a method and a device for generating a multi-document summary, which are used for solving the problem of bad readability of the multi-document summary generated by the prior art. The method comprises the steps of: selecting a plurality of summary sentences from a plurality of documents; and sequencing the summary sentences according to at least one set sequencing rule to generate the multi-document summary, wherein each sequencing rule is set according to date information in the summary sentences, position information of the summary sentences positioned in the documents or the interdependency between the summary sentences and summary subject contents. The technical scheme disclosed by the invention gives full consideration to the continuity among the summary sentences and the interdependency between the summary sentences and the subject contents, thereby effectively improving the readability of the generated multi-document summary.
Owner:PEKING UNIV +2

Multiple file summarization method based on sentence relation graph

ActiveCN1828608AExtended Digest MethodBig contribution weightSpecial data processing applicationsDiffusionDocumentation
To overcome the defect in prior art, the invention calculated the true semantic relation with diffusion character of sentence relation, and makes a difference between the sentences inside the document and within documents. This invention has well effect in practical evaluating.
Owner:PEKING UNIV

Automatic writing method based on extraction type multi-document abstract method

The invention relates to an automatic writing method based on an extraction type multi-document abstract method, which comprises the following steps of A1, user input and data preprocessing of receiving a keyword inputted by a user, retrieving the related data on a data retrieval platform, and performing the preliminary processing on the retrieved related data; A2, graph sorting of inputting a plurality of documents, firstly identifying all sentences by the system, and scoring the importance of all the sentences; A3, redundancy removal of if two or more sentences with the similarity exceedinga preset threshold exist in the sentences, only reserving one sentence, and outputting an ordered sentence list with redundant sentences removed; and A4, constructing and outputting, selecting the most important sentences from the ordered sentence list provided in the previous stage according to the limitation of the text from front to back, reordering the sentences, and outputting a manuscript formed by the ordered sentences.
Owner:SHENZHEN GRADUATE SCHOOL TSINGHUA UNIV

Evolutionary summarization generation method for internet news events

The invention relates to an evolutionary summarization generation method for internet news events. The evolutionary summarization generation method includes the steps: inputting a related news document set; representing documents as topic facture vectors by an LDA (latent Dirichlet allocation) topic model; clustering the documents represented as the topic facture vectors; calculating local scoresof the documents in each topic; calculating global scores of the documents in each topic; calculating final scores of the documents in each topic; extracting document titles with high scores from eachtopic to serve as a summary according to time sequence; outputting the summary. The dimensions of the topic facture vectors are first preset values, and each cluster represents one topic. According to the evolutionary summarization generation method for the internet news events, the extracted summary has dynamic evolvability and is coherent and strong in readability, and experimental results indicate that the system is greatly improved in terms of redundancy, coherence and dynamic evolvability as compared with a traditional multi-document summarization system.
Owner:SUZHOU UNIV

System, method, and user interface for a search engine based on multi-document summarization

A method for searching multiple documents on a computer system includes steps for sending a query to a system core where the query is passed to a search component for searching the documents. The system core in turn receives results from the search component indicating related documents to the query and passes to a summarization component a specified number of the results. The summarization component processes related documents corresponding to the specified number of results to produce a multi-document summary. The system core receives the summary from the summarization component. The multi-document summary is received from the system core.
Owner:SOUBBOTIN DMITRI

Multimedia document summarization

Multimedia document summarization techniques are described. That is, given a document that includes text and a set of images, various implementations generate a summary by extracting relevant text segments in the document and relevant segments of images with constraints on the amount of text and number / size of images in the summary.
Owner:ADOBE INC

Multiple-document summarization using document clustering

The invention relates to multiple-document summarization using document clustering. Systems and methods are disclosed for summarizing multiple documents by generating a model of the documents as a mixture of document clusters, each document in turn having a mixture of sentences, wherein the model simultaneously representing summarization information and document cluster structure; and determininga loss function for evaluating the model and optimizing the model.
Owner:NEC LAB AMERICA

System, method, and user interface for a search engine based on multi-document summarization

A method for searching multiple documents on a computer system includes steps for sending a query to a system core where the query is passed to a search component for searching the documents. The system core in turn receives results from the search component indicating related documents to the query and passes to a summarization component a specified number of the results. The summarization component processes related documents corresponding to the specified number of results and removes duplicate results to produce a multi-document summary. The system core receives the summary from the summarization component. The multi-document summary is received from the system core.
Owner:SOUBBOTIN DMITRI

Multi-document abstract generation method and system

The invention provides a multi-document abstract generation method which comprises the following steps: S1, determining a theme, acquiring a plurality of documents related to the theme, and constructing a first corpus; S2, constructing an HLDA topic model for the topic, and obtaining sub-topics; S3, calculating importance scores of the clauses; S4, calculating the importance degree of the sub-topics; and S5, extracting abstract sentences. According to the method, news features are added, an HLDA theme importance calculation method is improved, reasonable sentence scores are obtained, and meanwhile on the basis of a traditional abstract sorting step, features of inter-sentence information are added to serve as one of bases for judging sentence sorting, so that finally obtained abstract sentences are more accurate, and sentences are smoother.
Owner:COMMUNICATION UNIVERSITY OF CHINA

Method for automatically generating unsupervised science and technology intelligence abstract based on multi-sentence compression

The invention relates to an unsupervised scientific and technological intelligence abstract automatic generation method based on multi-sentence compression, and belongs to the technical field of natural language generation. Aiming at multi-document text generation in the field of science and technology intelligence, firstly, source data are acquired based on a topic crawler of an LDA topic similarity word library extension method; and sorting all text paragraphs through a text information value evaluation model of three indexes of authority, timeliness and content correlation of the text information. And selecting a paragraph with a higher score as an original text for generating the final science and technology intelligence. Finally, an unsupervised multi-document abstract method based on spectral clustering and multi-sentence compression is adopted, and a science and technology intelligence abstract is automatically generated. According to the method, the problem that in the data screening process, scientific and technological information generation has high requirements for data timeliness and authority is effectively solved, and the problem that a traditional multi-document generation method based on a neural network cannot be applied due to lack of a data set in the field of scientific and technological information is effectively solved.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY

Method and device for automatic generation of multi-document abstract of industrial safety topics

The invention discloses a method and device for automatic generation of a multi-document abstract of industrial safety topics. The method includes the steps that input keywords are acquired, multiple documents corresponding to the keywords are inquired, and a document set is formed by the multiple documents; aiming at the documents in the document set, grammatical component decomposition is conducted on sentences in the documents, then multiple phrases are obtained, attribute information is set for each phrase, and a phrase set is formed by the multiple phrases; the phrase set is input into a sub-modular function, and the sub-modular function is optimized; according to an optimization result, a target phrase subset is determined, wherein the target phrase subset is a subset of the phrase set; multiple sentences are formed by phrases of the target phrase subset, the priority levels of the sentences are determined according to the attribute information of the phrases in the sentences; according to the priority levels of the sentences, the sentences are spliced in sequence, and the abstract is formed.
Owner:BEIHANG UNIV

Multiple file summarization method based on sentence relation graph

ActiveCN100435145CExtended Digest MethodBig contribution weightSpecial data processing applicationsDiffusionDocumentation
To overcome the defect in prior art, the invention calculated the true semantic relation with diffusion character of sentence relation, and makes a difference between the sentences inside the document and within documents. This invention has well effect in practical evaluating.
Owner:PEKING UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products