Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

30 results about "Sentence clustering" patented technology

Usually sentence clustering is used to cluster sentences derived from different documents and can be considered as a transverse segmentation of the documents content. Thus, the number of clusters can exceed the number of documents.

News sentence clustering method based on semantic similarity, device and storage medium

The invention provides a news sentence clustering method based on semantic similarity. The method includes the following steps: preprocessing news sentences of a corpus, and extracting available words; utilizing the available words to train a continuous bag-of-words model to obtain an initial word vector of each available word; utilizing an initial sentence vector of each news sentence and the initial word vectors of the left and right adjoining available words of a certain available word in the news sentence to train the continuous bag-of-words model in an iterative manner to obtain a currentword vector of each available word in the news sentence and a final sentence vector of the news sentence; merging an average value of the word vectors of all the available words, one-hot vectors of high-frequency words and the final sentence vector of each news sentence to obtain a semantic vector of the news sentence; and calculating distances between the semantic vectors to obtain the semanticsimilarity between the different news sentences, and clustering the news sentences of the corpus in accordance therewith. The invention also provides an electronic device and a computer-readable storage medium.
Owner:PING AN TECH (SHENZHEN) CO LTD

Text data viewpoint summary mining method merging topic attributes and emotion information

The invention provides a text data viewpoint summary mining method merging topic attributes and emotion information. The method comprises the steps of preprocessing a text corpus set of a topic; inputting a topic corpus set and a background corpus set; extracting the topic attributes of the topic corpus set; adding emotional polarities to the obtained topic attributes, and vectorizing sentences; taking the obtained topic attributes as evaluation objects, obtaining emotional attribute features contained in the sentences, and conducting feature vectorization on one sentence by means of a topic attribute and emotion analysis method; utilizing an obtained topic attribute set and a text sentence feature vector set S to construct a three-layer graph structure, and clustering all the text sentences; selecting sentences from class clusters to form a viewpoint summary, and selecting the sentences with high scores to form a viewpoint summary. According to the text data viewpoint summary mining method, the extracted topic attributes are more accurate by adopting a topic attribute extraction method, and meanwhile the text data viewpoint summary mining method can be applied not only to the field of Chinese microblogs but also to the field of website news and product reviews.
Owner:FUZHOU UNIV

Method for structured processing of Chinese pathological text

The present invention relates to a method for structured processing of a Chinese pathological text. The method comprises the following steps: extracting template information corresponding to each sample from a hierarchical stricture of a sample of text data of a pathological report text data and indicator; extracting the template information comprising short sentence segmentation and indicator name extraction; classifying the short sentences; with respect to each sample, in combination with a classification result cluster and a short sentence cluster, calculating a TF value, an IDF value and a C-value of each indicator name in an indicator name list in a short sentence language material, and screening out an indicator name whose TF value, IDF value and C-value satisfy a threshold, and using the obtained indicator name as a component of the final template. According to the present invention, a non-structured Chinese pathological text can be structured.
Owner:DONGHUA UNIV +1

Multilingual automatic abstract method

The invention relates to the technical field of text generation in natural language processing. The invention relates to a method, in particular to a multilingual automatic abstract method. INCLUDINGA Whole AUTOMATIC ABNORMATION SYSTEM, the automatic abstract system is divided into a model training module; a single-document abstract module and a multi-document abstract module, the model trainingmodule is divided into a text preprocessing module and a training module; wherein the single-document summary module is divided into a text preprocessing module and a summary generation module, the multi-document summary module is divided into a text preprocessing module, a multi-language sentence clustering module and a summary generation module, a model in the model training module is a seq2seqneural network model, and a training text is obtained through summary-summary generation. According to the invention, a multilingual generative automatic abstract system is designed and realized, a bilingual word embedding technology and a deep learning method are adopted, and a brief abstract is generated for a text or a text set specified by a user, so that the user is helped to browse intentions of an original text and quickly find out the most required information.
Owner:YANBIAN UNIV

Information processing method and system for knowledge services

The invention discloses an information processing method and system for knowledge servers. The method comprises the following steps: obtaining all or part of the knowledge points as a knowledge point set; determining the semantic information of each knowledge point in the knowledge point set; determining a sentence cluster set corresponding to the knowledge points according to semantic information; determining corresponding chapter information according to the sentence cluster set; and determining corresponding digital resources according to the chapter information. According to the method, the semantic information of the knowledge points is comprehensively considered, and the manner of correlating the corresponding knowledges through the keywords input by the users is not used, so that the method more fits the real demands of the users and is capable of correlating the corresponding knowledges mostly fitting the user demands according to the semantic information of the knowledge points, so that the organization of the knowledges in the field in a knowledge point manner is really realized and the user experience is improved.
Owner:NEW FOUNDER HLDG DEV LLC +2

Structural processing method for a thyroid ultrasound report based on a tree structure

The invention relates to a tree-shaped structured template established according to a part-of-speech dictionary and a dependency relationship tree, and a method for structuring a thyroid ultrasound report by referring to the template. The overall process mainly comprises a part-of-speech dictionary establishing module, a tree structure template establishing module and a tree template calling structuring stage. And the part-of-speech dictionary establishing module is used for carrying out short sentence segmentation on the report and carrying out short sentence clustering. And then a complete part-of-speech dictionary is established by using a named entity recognition technology according to the organ words ORG, the position words LOC, the attribute words ATT and the attribute names. And the tree template establishing module is used for analyzing by using a dependency syntactic to obtain a semantic relationship of each short sentence and obtaining a part-of-speech of each word by usinga part-of-speech dictionary. And a tree template establishment process is provided by combining the two steps. And the tree template calling module is used for carrying out text structuring by using atree template.
Owner:DONGHUA UNIV +1

Information extraction method based on deep semantic comprehension

The invention provides an information extraction method based on deep semantic comprehension, which comprises the following steps of: constructing a body and a basic relationship in the field, and manually labeling parts of corpora; processing the manually annotated corpora, identifying an entity type corresponding to a specific relationship, and mining new words and synonyms in the field at the same time; merging synonyms recognized in the sentences, abstracting the original sentences and making syntactic analysis; clustering the abstracted sentences into sentence templates, and performing template learning; making sentence template evaluation; and performing new relationship extraction on manually unlabeled corpora by utilizing the sentence template, and evaluating and filtering a new relationship. According to the method provided by the invention, the syntactic analysis result can be better utilized, so that the automatically mined template has higher-level abstraction and generalization capabilities.
Owner:鼎复数据科技(北京)有限公司

Theme information-based text segmentation method

The invention discloses a theme information-based text segmentation method, which comprises the following specific operations of: preprocessing an input text and a training set to obtain a sentence consisting of a series of words; carrying out feature extraction to obtain feature vectors of the features; carrying out clustering operation on the input text according to semantic information contained in the sentence cluster to obtain a series of sentence clusters, and distributing a digital label for each cluster in sequence to obtain a series of simple sentences with the digital labels; distributing existing theme tags in a training set for each sentence, so that the existing theme tags in the training set are distributed to all sentences in the text. According to the invention, the digitallabel labeling result and the theme label labeling result are used for correction to obtain the text fragment with the theme label, and the theme label is distributed to the cut text, so that the theme described by the sentence can be clearly seen, the position for describing the theme in the text can be conveniently positioned according to the theme, and the retrieval is more convenient.
Owner:XI AN JIAOTONG UNIV

Sentence cluster extract method and device based on object knowledge point

The invention relates to a sentence cluster extract method and device based on object knowledge points; the method comprises the following steps: obtaining knowledge point accuracy attributes; using the accuracy attribute to extract attribute of the knowledge point from to-be processed digit resources; using the accuracy attribute and fuzzy attribute to do sentence cluster hitching of the knowledge points in the to-be processed digit resources; obtaining the knowledge point sentence cluster. The accuracy attribute and fuzzy attribute of the knowledge points are added so as to improve knowledge point sentence cluster extract accuracy.
Owner:NEW FOUNDER HLDG DEV LLC +2

Method and device for clustering sentences

The embodiment of the invention discloses a method and device for clustering sentences. One specific embodiment of the method comprises the steps of determining a set composed of semantic vectors corresponding to all sentences in a to-be-clustered sentence set as a semantic vector set; for each semantic vector in the semantic vector set, executing the following density calculation operation; for each semantic vector in the semantic vector set, executing the following clustering division operation; for each established cluster, determining the semantic vector with the maximum density in the semantic vectors divided into the cluster as the clustering center semantic vector of the cluster; and determining to-be-clustered sentences corresponding to the determined clustering center semantic vectors as a clustering center sentence set. According to the embodiment, the sentence clustering accuracy is improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Method for automatically analyzing user comments in application store and recommending comments to developers

The invention relates to a method for automatically analyzing user comments in an application store and recommending the user comments to developers. The method is technically characterized by comprising the following steps: collecting user comment data and preprocessing the user comment data; carrying out intention classification on the user comments and establishing a classification model; carrying out topic classification on the user comments under each intention classification; performing sentence clustering on the user comments under each topic category, and calculating the clustering center position; establishing a mechanism for evaluating the priority of the user comments, calculating comprehensive scores of the user comments and recommending the comprehensive scores to a software developer. According to the method, intention classification, topic classification and sentence clustering are performed through comment information, and comments are processed in combination with timesequence and sentiment analysis; the hotspot top-k comments recommended and returned by the system are obtained, comment contents with reference values are provided for developers, so that referencesare provided for development and maintenance of applications, intake of redundant information of the developers is effectively reduced, user experience is improved, and the method has the characteristics of accurate and reliable content analysis, convenience in use and the like.
Owner:TIANJIN UNIV

Context mining method and device based on clustering algorithm and electronic equipment

The invention provides a context mining method and device based on a clustering algorithm and electronic equipment. The method and the device specifically comprise the following steps: in response toa mining request of a user, screening from a pre-prepared call text according to a keyword specified by the mining request to obtain a plurality of key sentences containing the keyword; and intercepting a plurality of associated sentences directly connected with the key sentences from the call text; performing unsupervised clustering processing on the plurality of key sentences to obtain a plurality of sentence clusters; and for each statement cluster, performing context construction according to the keywords and the associated statements. According to the scheme, context construction for thecorresponding keywords is realized on the basis of the electronic equipment, so that a user can analyze important topics, verbal skills and the like of massive call texts according to the constructedcontext contents without viewing the text contents one by one, and the call text analysis efficiency is improved.
Owner:BEIJING SINOVOICE TECH CO LTD

A Text Automatic Summarization Method Based on Fusion Semantic Clustering

The invention discloses an automatic text summarization method based on fusion semantic clustering. The method comprises the steps of text preprocessing, wherein preprocessing is conducted on originaldocuments, and word frequency information of keywords in the text is counted; weight calculation, wherein local weights are combined, and global weights and introduced relevant weights are used for determining the contribution degree of the keywords in sentences; semantic analysis, wherein a text matrix is subjected to singular value decomposition to obtain a semantic analysis model to calculatea semantic vector of each sentence; clustering, wherein K sentence clusters are obtained through a clustering algorithm in a semantic space on the basis of the calculated sentence semantic vectors; sentence selection, wherein the sentence weights is calculated in each sentence cluster, the first n sentences are selected to compose an abstract according to ranking, and the redundancy is removed. The method is simple and practical, a characteristic representation is provided for the text, the semantic connection of the context is integrated, a co-occurrence relationship between the sentences andwords is more fully displayed, and the generated abstract can better in line with the theme of the text.
Owner:SOUTH CHINA UNIV OF TECH

Method for segmenting communication transcripts using unsupervised and semi-supervised techniques

A method is provided for forming discrete segment clusters of one or more sequential sentences from a corpus of communication transcripts of transactional communications that comprises dividing the communication transcripts of the corpus into a first set of sentences spoken by a caller and a second set of sentences spoken by a responder; generating a set of sentence clusters by grouping the first and second sets of sentences according to a measure of lexical similarity using an unsupervised partitional clustering method; generating a collection of sequences of sentence types by assigning a distinct sentence type to each sentence cluster and representing each sentence of each communication transcript of the corpus with the sentence type assigned to the sentence cluster into which the sentence is grouped; and generating a specified number of discrete segment clusters by successively merging sentence clusters according to a proximity-based measure between the sentence types assigned to the sentence clusters within sequences of the collection.
Owner:NUANCE COMM INC

Intelligent input method and device for electronic medical records

ActiveCN112883712AImprove the efficiency of medical record inputMuch timeMedical data miningNatural language data processingMedical recordEngineering
The invention provides an intelligent input method and device for electronic medical records. The intelligent input method comprises the following steps: classifying massive electronic medical record text data according to disease categories; performing similarity calculation on the sentences under each disease category, and clustering the sentences according to the similarity; extracting keywords or words within the first few words of the same kind of sentences as sentence heads, establishing a sentence head library, taking other parts of the same kind of sentences except the sentence heads as sentence tails, establishing a sentence tail library, counting the occurrence frequency of the same kind of sentence tails, and establishing a sentence frequency library; after a specific keyword or word is input and a sentence head library is matched ,determining a sentence clustering category, calling a sentence tail library and a sentence frequency library, carrying out sentence tail completion, and displaying a plurality of sentence tails with frequency ranks from high to low in the sentence frequency library on display equipment for a user to select; and obtaining the sentence tail selected by the user every time and updating the sentence frequency library in real time. The medical record input efficiency of medical staff can be effectively improved, so that more time is reserved for treating a patient, and the medical quality is improved.
Owner:GENERAL HOSPITAL OF SOUTHERN THEATRE COMMAND OF PLA

Method for constructing and processing human behavior text data set based on crowdsourcing

The invention discloses a method for constructing and processing a human behavior text data set based on crowdsourcing, which comprises the following steps of: firstly, determining a subject object needing to be collected, generating a task according to a specific requirement, publishing the task to a crowdsourcing platform, and obtaining a text data set of all possible human examples under a set subject; presenting the text of the same behavior or event in a plurality of sentences after being written by different persons, so that different sentences describing the same event need to be clustered together, and different text representations belonging to the same behavior are clustered into one class for the acquired data set by adopting a clustering mode; mining a precedence relation structure existing between behaviors by adopting a correlation analysis technology; adopting the mutual information technology to learn a mutual exclusion relation structure existing between behaviors, creating various relations existing in the human behaviors into a plot. What events can occur under a certain condition is indicated, the occurrence mode of the events is limited, and the analysis accuracy of the human behaviors is improved.
Owner:GUILIN UNIV OF ELECTRONIC TECH +1

Natural language-based airworthiness instruction problem feature extraction

The invention relates to the technical field of airworthiness certification, in particular to natural language-based airworthiness instruction problem feature extraction, which comprises the followingsteps of: extracting problem description chapters behind an airworthiness instruction, and carrying out text data preprocessing; detecting overlapped sentence clusters; selecting a given number of sentence clusters; extracting feature descriptors. The method for extracting the features by detecting the overlapped sentence clusters and directly selecting the phrases from the text description has higher accuracy. Meanwhile, the method has better performance in the aspect of time consumption compared with a comparison method selected in the prior art; key design features, expressed by the airworthiness instruction text, of aircraft products can also be found in feature extraction actually for airworthiness instructions.
Owner:中国民用航空上海航空器适航审定中心

Generating and using a sentence model for answer generation

In an approach to generating and using a sentence model for answer generation, one or more computer processors ingest a first corpus of a plurality of text sentences. One or more computer processors convert the plurality of text sentences into a plurality of sentence vectors. One or more computer processors group the plurality of sentence vectors into a plurality of sentence clusters, wherein a sentence cluster is composed of sentences that are semantically similar. One or more computer processors receive a second corpus. One or more computer processors determine, for each sentence cluster of the plurality of sentence clusters, a frequency each sentence cluster appears in the second corpus. Based on the determined frequency, one or more computer processors calculate a probability of each sentence cluster of the plurality of sentence clusters. Based on the calculated probabilities, one or more computer processors generate a first sentence model.
Owner:IBM CORP

Key sentence extraction method, system, and computer-readable storage medium

The invention discloses a key sentence extraction method, system, and computer-readable storage medium, wherein the key sentence extraction method includes the following steps: obtaining a target question and a target answer; performing sentence processing on the target answer to obtain several answer sentences; calculating each answer The correlation between the sentence and the target question is obtained to obtain the corresponding correlation score; the answer sentences are combined in pairs to obtain a number of answer pairs, and the coherence between the two answer sentences in the answer pair is calculated to obtain the corresponding coherence score; Based on the coherence score, each answer sentence is clustered to obtain several sets of sentence clusters; the correlation score corresponding to each answer sentence in the sentence cluster is extracted, and the relationship between the sentence cluster and the target question is calculated based on the extracted correlation score. Correlation degree: extract each answer sentence in the sentence cluster with the greatest correlation degree, and obtain the corresponding key sentence. The key sentences extracted by the present invention take both coherence and relevance into consideration, and can accurately express the central content of the target answer.
Owner:无码科技(杭州)有限公司

A Method for Structural Processing of Chinese Pathological Texts

The present invention relates to a method for structured processing of a Chinese pathological text. The method comprises the following steps: extracting template information corresponding to each sample from a hierarchical stricture of a sample of text data of a pathological report text data and indicator; extracting the template information comprising short sentence segmentation and indicator name extraction; classifying the short sentences; with respect to each sample, in combination with a classification result cluster and a short sentence cluster, calculating a TF value, an IDF value and a C-value of each indicator name in an indicator name list in a short sentence language material, and screening out an indicator name whose TF value, IDF value and C-value satisfy a threshold, and using the obtained indicator name as a component of the final template. According to the present invention, a non-structured Chinese pathological text can be structured.
Owner:DONGHUA UNIV +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products