Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

543 results about "IntraText" patented technology

IntraText is a digital library that offers an interface while meeting formal requirements. Texts are displayed in a hypertextual way, based on a Tablet PC interface. By linking words in the text, it provides Concordances, word lists, statistics and links to cited works. Most content is available under a Creative Commons license It also offers publishing services that enable similar advantages.

Semantic understanding based emoji input method and device

The present disclosure provides a semantic understanding based emoji input method and device, and relates to the input method technology field. The method includes: obtaining a text content according to an input sequence; performing word segmentation on the text content, and extracting text features based on the word segmentation result; constructing an input vector using the text features, performing classification using an emotion classification model to determine an emotion label of the text content; based on a correspondence relationship between the emotion label and emojis of various themes, respectively obtaining an emoji corresponding to the emotion label from each of the various themes; sorting the obtained emojis of the various themes, and displaying the sorted emojis as candidate options in a client. The disclosed invention facilitates users to input an emoji, enhances emoji input efficiency, and provides users with rich and wide scope of emoji resources.
Owner:BEIJING SOGOU TECHNOLOGY DEVELOPMENT CO LTD

System and method for near and exact de-duplication of documents

A system, method and computer program product for identifying near and exact-duplicate documents in a document collection, including for each document in the collection, reading textual content from the document; filtering the textual content based on user settings; determining N most frequent words from the filtered textual content of the document; performing a quorum search of the N most frequent words in the document with a threshold M; and sorting results from the quorum search based on relevancy. Based on the values of N and M near and exact-duplicate documents are identified in the document collection.
Owner:MSC INTPROP

Systems and methods for sentence based interactive topic-based text summarization

Techniques for determining sentence based interactive topic-based summarization are provided. A text to be summarized is segmented. Discrete keyword, key-phrase, n-gram, sentence and other sentence constituent based summaries are generated based on statistical measures for each text segment. Interactive topic-based summaries are displayed with human sensible omitted text indicators such as alternate colors, fonts, sounds, tactile elements or other human sensible display characteristics useful in indicating omitted text. Individual and / or combinations of discrete keyword, key-phrase, n-gram, sentence, noun phrase and sentence constituent based summaries are dynamically displayed to provide an overview of topic and subtopic development within a text. A hierarchical and interactive display of texts based on the use of discrete sentence constituent based summaries which associates expansible and contractible displayed text provides contextualized access to an interactive topic-based text summary and to an original text.
Owner:XEROX CORP

Identification of text

A method of generating a code representative of a passage of text uses in the preferred embodiment the character spacing between respective occurrences of a selected key symbol string within the text. The string may be fixed, or may encompass a variety of different forms. By comparing the known code of a target text passage with the code generated from a sample text passage, it is easy to determine whether the target text has been used within the sample. The method may be integrated within a copying device such as a photocopier, allowing the device report automatically whenever a user attempts to copy a document bearing one of a predefined list of sensitive or controlled text passages.
Owner:MINERAL LASSEN

Information returning method and device

The present invention relates to an information reply method and an apparatus, wherein the information reply method comprises: receiving information to be replied, wherein the information to be replied comprises text content and contact information; based on the text content and the contact information, querying corresponding dialog style information from a database; performing preprocessing on the text content, wherein the preprocessing comprises segmentation processing and stop word removal processing; and based on data after the preprocessing, performing query in a database corresponding to the dialog style information so as to determine reply information. A dialog style is determined through the information to be replied and the proper reply information is queried in the database corresponding to the dialog style based on the data after the preprocessing for a user to select, which can shorten time for the user to reply information and improve user experience.
Owner:HUAWEI TECH CO LTD

Text classification model training method, device and equipment and storage medium

The invention discloses a text classification model training method, device and equipment and a storage medium and belongs to the field of artificial intelligence. According to the method, on one hand, the adversarial sample is introduced, and the text classification model is trained by using the text sample and the adversarial sample, so the text classification model learns the classification method for the text added with disturbance, and thereby the robustness of the text classification model is improved, and accuracy of text classification is improved; on the other hand, the text classification model can reconstruct the text features of the adversarial samples extracted during classification and restore the text features into text content, so interpretability of the adversarial training method is improved. According to the method, the model parameters are trained by combining the errors between the reconstructed text content and the text content of the text sample, so the text classification model can extract more accurate text features, namely, more accurate feature expression of the text content is obtained, and robustness and accuracy of feature extraction of the text classification model are improved.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Question and answer processing method and device, language model training method and device, equipment and storage medium

The invention discloses a question and answer processing method and device, a language model training method and device, equipment and a storage medium, and relates to the field of natural language processing. The specific implementation scheme is as follows: obtaining at least one candidate table matched with a to-be-queried question, wherein each candidate table comprises a candidate answer corresponding to the question; processing the at least one candidate table to obtain at least one table text, the table text comprising text content of each domain in the candidate table, the domains comprising titles, headers and cells; respectively inputting the question and each table text into a preset language model to obtain a matching degree of the question and each candidate table; according to the matching degree of each candidate table, outputting a reply table, wherein the reply table is a candidate table of which the matching degree with the question is greater than a preset value or acandidate table corresponding to the maximum matching degree in the at least one candidate table. The language model is adopted to perform semantic matching on the questions and the texts, so that the matching accuracy and recall rate of the questions and the tables are improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Cross-modal video retrieval method and system based on multi-head self-attention mechanism and storage medium

The invention provides a cross-modal video retrieval method and system based on a multi-head self-attention mechanism and a storage medium, and the cross-modal video retrieval method comprises a videocoding step, a text coding step and a joint embedding step. Semantic information in the training multi-modal data is fully utilized for training, a multi-head self-attention mechanism is introduced,fine interaction in videos and texts is captured, key information of the multi-modal data is selectively concerned to enhance the characterization capability of the model, data semantics are better mined, and the invention has the advantages of being high in practicability and easy to popularize. Consistency of the distances of the data in the original space and the shared subspace is ensured. Theinvention has the beneficial effects that experiments prove that the similarity of the data in the original space can be effectively maintained, and the retrieval accuracy can be improved.
Owner:HARBIN INST OF TECH SHENZHEN GRADUATE SCHOOL

Fine-grained semantic detection method of harmful text contents in network

InactiveCN102609407AFulfilling Semantic Recognition RequirementsSmall uncertaintySpecial data processing applicationsData miningMachine learning
The invention belongs to the technical field of text content filtration, and particularly relates to a fine-grained semantic detection method of harmful text contents in network. Aiming at an introduced harmful information scene, the method comprises the steps of: constructing a train text set in which independent sentences are used as basic units, thereby establishing a mathematic description of the scene by using a probability topic model; performing information content extraction to a Web page to be detected; performing sentence identification to the text information; calculating a condition probability of each sentence under the model based on the established probability topic model; and accomplishing the fine-grained semantic detection under the set content detection sensitivity. According to the invention, the model construction is hardly affected by the number of the topics, and probability calculation on the sentence and word level is carried out effectively, so that the method is applicable for various application circumstances requiring harmful text content detection; furthermore fine-grained detection to harmful words and sentences of the text content is supported, so that the method improves the detection rate and reduces the misinformation rate effectively, and is beneficial to improving the practicability of text content filtration.
Owner:FUDAN UNIV

Text abstract generation method and device, text abstract training method and device, equipment and medium

The invention discloses a text abstract generation method and device, equipment and a medium, the method belongs to the field of computer vision, and the method comprises the following steps: obtaining text content; inputting the text content into a coding layer to obtain a hidden layer embedding vector of the text content; wherein the coding layer is obtained based on collaborative training of anextraction type abstract generation model and a generation type abstract generation model; and inputting the hidden layer embedding vector into the extraction type abstract generation model or the generation type abstract generation model for processing, and outputting the text abstract. According to the method and the device, the respective advantages of the extraction type abstract generation mode and the generation type abstract generation mode can be integrated, so that the finally output text abstract can better summarize the content characteristics of the text content.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Image element alignment for printed matter and associated methods

A system and method are provided for dynamically, automatically aligning an element within textual matter, wherein the element has a vertical extent differing from a vertical extent of surrounding text. The method comprises the step of, based upon a difference between a font size of the text font size and a vertical extent of the element, calculating a vertical offset for placing the element relative to the textual matter. The calculated vertical offset is stored for subsequently achieving a dynamic adjustment in presentation of textual matter with the element placed therein, irrespective of desired output format.
Owner:PSYCHOLOGICAL CORPORATION

Method, device and equipment for text retrieval and storage medium

The invention provides a method, device and equipment for text retrieval and a storage medium, and relates to the field of artificial intelligence such as big data and natural language processing. According to the specific implementation scheme, the method includes using a full-text search engine for obtaining a plurality of candidate texts meeting a retrieval formula; calculating multi-dimensional features of the candidate text according to the keyword of the retrieval formula and the text content of the candidate text; obtaining a correlation score through text similarity operation based on multi-dimensional feature fusion, sorting the multiple candidate texts according to the correlation score, wherein the correlation score is used for representing the text similarity of the candidate texts and the retrieval formula; and according to a preset rule, carrying out secondary sorting and filtering on the plurality of candidate texts to obtain a target text. According to the technical scheme, accurate query and sorting of the texts can be realized semantically, so that the accuracy and efficiency of text retrieval are improved, and the text retrieval experience of a user is improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

On-line classroom discussion short text real-time grouping method and system based on text clustering

The invention discloses an on-line classroom discussion short text real-time grouping method and system based on text clustering. The method comprises the steps of conducting word-splitting preprocessing and stop-word preprocessing on text data; obtaining all text item keywords, counting all the text item keywords and storing the text item keywords into a keyword table keyTable; conducting frequent item set mining on a preprocessed text set, filtering all sub-item quasi-frequent item sets and conducting coarse cluster classification in combination with a keyword table definition quasi-frequentitem set similarity calculation rule; mapping points, the closest to the cluster center, of all clusters to the text set, calculating TF-IDF values of text word sets in all the clusters and iteratingthe center of mass to be optimal according to the distance; pushing the obtained K clusters in real time in group. Through the combination of the keyword table definition quasi-frequent item set similarity calculation rule, the clustering accuracy of an on-line discussion short text is effectively improved; through a quasi-frequent item set filtering strategy, the clustering efficiency is effectively improved, and a clustering method is accelerated; the text information content discussed on an on-line classroom is automatically classified into multiple themes, and the text content is groupedaccording to the themes.
Owner:SOUTH CHINA UNIV OF TECH

Microblog rumor detection method

The invention provides a microblog rumor detection method, which considers an attention mechanism. The method comprises the following steps of collecting a microblog event and a corresponding commentdata set as sample data; preprocessing the sample data, and respectively extracting text contents of original microblogs and comments; pre-training the text by adopting a BERT pre-training model, andgenerating a sentence vector with a fixed length for each sentence of text; constructing a dictionary, and extracting original microblogs and a plurality of corresponding comments to form a microblogevent vector matrix; training the vector matrix by adopting a deep learning method Text CNN-Attention, and constructing a multi-level training model; and performing classification detection on the vector matrix according to the multi-level training model to obtain a rumor detection result corresponding to the social network data. Compared with a traditional rumor detection method, accuracy is improved.
Owner:NANJING UNIV OF POSTS & TELECOMM

Text information recommendation method and system

ActiveCN106202394AResolve semantic ambiguitySolving the Semantic Information Relevance ProblemSemantic analysisGeneral purpose stored program computerAmbiguityLatent Dirichlet allocation
The invention provides a text information recommendation method. The text information recommendation method comprises the following steps: establishing an information recommendation pool; acquiring text content of an article requiring information recommendation; segmenting the article requiring information recommendation into multiple words; predicting multi-dimensional topic distribution of the article requiring information recommendation according to multi-dimensional topic distribution of words in an LDA (latent dirichlet allocation) model base; calculating information correlation between the article requiring information recommendation and articles in the information recommendation pool; sorting related information in the information recommendation pool according to an information correlation calculation result; outputting recommended information according to a sorting result. With adoption of the method, semantic ambiguity of related information and semantic related problems during information recommendation can be solved, information heat and timeliness are taken into consideration, and the click-through rate of users is increased. The invention further provides a system for implementing the text information recommendation method.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Short message editing method and terminal

The embodiment of the invention discloses a short message editing method, comprising obtaining text contents of a short message to be replied; identifying whether a preset key word exists in the text content exists, and querying and displaying a fast reply list associated with the preset keyword if a preset key word exists in the text content exists; and obtaining a fast reply message selected by a user in the fast reply list to generate a short message to be sent. The embodiment of the invention also discloses a terminal. According to the invention, the efficiency of short message reply can be improved.
Owner:廖建强

Altering content based on machine-learned topics of interest

A method, system, and computer program product are provided. Content portions of respective items of multiple items of textual content are tagged according to determined topics, keywords and phrases. Data is collected regarding viewed regions of a display screen displaying at least a portion of the respective item. A model is derived for predicting portions of textual content of interest based on the collected data regarding the viewed regions of the display screen and the tagging. A new item of textual content is altered to provide portions of interest based on the model.
Owner:IBM CORP

Textual Content Speed Player

A computer program that plays a user's textual content in an animated format to increase reading speed via reconditioning reading behavior. The computer program: highlights a word being read and positions a copy in the center of the page so a reader can eliminate repositioning their eyes and still satisfy the common scanning strategy's need to have a word projected on the center of the retina while the reader is reconditioned by the highlighting progressing through the text to use the faster scanning strategy of changing the position on the retina being read with eyes fixed; presents a picture representing the word's meaning to recondition users to utilize the faster cognitive strategy of triggering recognition of meaning with a picture; highlights the word for a time calculated using syllables to create a presentation in sync with the natural timing of speech; can play / record a voice speaking the word for creating audible textual presentations enabling the non-reader to enjoy their textual content while learning printed language through context alone.
Owner:NELSON ANDREW THOMAS

Methods and Systems for Comparison of Structured Documents

Systems and methods of comparing structured documents are disclosed. From / to source documents are first represented by their respective from / to XML forms based on a predetermined schema. One or more from nodes are selected from the from XML document to compare to one or more to nodes from the to XML document. The comparison employs a set of matching functions that may be selected based on the domain of the source documents. The matching functions may compare just the tags of XML elements, and / or their text contents and / or any of their relevant attributes. The matching may be exact or approximate. Each matching function computes a score which may be weighted. For each pair of from / to nodes, an overall match-score is computed based on the scores of the individual matching functions. If the match-score reaches a matching-threshold, the pair is determined to be a match and further matching is stopped. The techniques are extended for comparing multiple from documents to a to document.
Owner:XCENTIAL CORP

Text-based event pushing method and device, electronic equipment and storage medium

The invention discloses a text-based event pushing method and device, electronic equipment and a storage medium, and relates to the technical field of knowledge graphs, deep learning, natural languageprocessing and cloud computing. According to the specific implementation scheme, the method comprises the steps: obtaining text content of the target event type, performing word segmentation processing on the text content; obtaining a word sequence containing a plurality of words; inputting the word vectors of the words in the word sequence into a sequence labeling model corresponding to the target event type to mark event attributes for all words in the word sequence; according to the marked event attributes of the words in the word sequence, generating the description information of the target event type, pushing the description information to the clients paying attention to the target event type so as to be displayed on the clients. The description information of the target event typecan be automatically generated and displayed on the clients, and the efficiency and comprehensiveness of obtaining the related information of the concerned event by the user are improved.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Method and device for text recoverable watermark based on synonym replacement

The invention relates to a method and device for a text recoverable watermark based on synonym replacement, belonging to the technical field of copyright protection of computer texts. The water recoverable watermark is a technology of embedding secret information into a text and recovering the original text while extracting corresponding watermark information. In the method and the device, words with synonyms in the text are simulated into pixel value pairs, integer reversible transform is utilized for embedding or extracting copyright information and the original text is recovered while extracting the copyright information. The method and the device have the advantages of recovering the original text while extracting the copyright information. The method and the device can be applied in the fields with high requirements on text contents, such as military affairs, law, literature and the like, not only ensure the copyright of documents, but also prevent the ambiguity interpretation oflegal users.
Owner:NANJING UNIV OF INFORMATION SCI & TECH

Text scoring method, device and system

The invention provides a text scoring method, device and system. The method comprises the steps of obtaining a to-be-scored text; extracting text features of the text; wherein the text features comprise shallow language features, syntax features, semantic features and topic features; wherein the semantic features are used for representing semantic coherence in the text; wherein the topic featuresare used for representing the relevancy between the text and a preset text topic; inputting the text features into a preset scoring model to obtain an output result; and determining the score of the text according to the output result. Shallow language features, syntax features, semantic features and theme features are extracted from a text to be scored to serve as text features, the text featuresare input into a preset scoring model, and an output result output by the scoring model serves as a score of the text. Comprehensive evaluation and analysis are carried out on four aspects of shallowlanguage features, syntax features, semantic features and theme features, so that the reliability of a scoring result can be enhanced.
Owner:HUAZHONG NORMAL UNIV

Rumor detection method based on linear and nonlinear propagation

ActiveCN112256981ARich auxiliary informationMake up for the inability to flexibly learn dependencies between nodesDigital data information retrievalSemantic analysisTime informationNatural language understanding
The invention relates to a rumor detection method based on linear and nonlinear propagation, and belongs to the technical field of natural language understanding. According to the method, unified modeling representation is carried out on rumor nodes by utilizing text content and time information, and rumor detection is automatically carried out in a mode of combining linear and nonlinear propagation characteristics. Firstly, text information and time information contained in rumor nodes are used for carrying out joint representation on mixed features of the rumor nodes; then, node informationis aggregated along the linear time sequence and the nonlinear diffusion structure, expression of a source node is enhanced, and final propagation representation is formed. And finally, authenticity label prediction is carried out by using propagation representation. According to the method, node characteristics of rumors are extracted from two different angles, tree perception representation is obtained from a nonlinear diffusion mode, characteristics of propagation sequences are captured from linear time sequence interaction, and authenticity of the rumors can be accurately predicted.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products