Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

37 results about "Text normalization" patented technology

Text normalization is the process of transforming text into a single canonical form that it might not have had before. Normalizing text before storing or processing it allows for separation of concerns, since input is guaranteed to be consistent before operations are performed on it. Text normalization requires being aware of what type of text is to be normalized and how it is to be processed afterwards; there is no all-purpose normalization procedure.

Systems and methods for text normalization for text to speech synthesis

Algorithms for synthesizing speech used to identify media assets are provided. Speech may be selectively synthesized form text strings associated with media assets. A text string may be normalized and its native language determined for obtaining a target phoneme for providing human-sounding speech in a language (e.g., dialect or accent) that is familiar to a user. The algorithms may be implemented on a system including several dedicated render engines. The system may be part of a back end coupled to a front end including storage for media assets and associated synthesized speech, and a request processor for receiving and processing requests that result in providing the synthesized speech. The front end may communicate media assets and associated synthesized speech content over a network to host devices coupled to portable electronic devices on which the media assets and synthesized speech are played back.
Owner:APPLE INC

System And Method For Automatically Generating Adaptive Interaction Logs From Customer Interaction Text

ActiveUS20140153709A1Shorten the timeFacilitates correct generationSpecial service for subscribersManual exchangesAdaptive interactionContact center
A system and method for providing an adaptive Interaction Logging functionality to help agents reduce the time spent documenting contact center interactions. In a preferred embodiment the system uses a pipeline comprising audio capture of a telephone conversation, automatic speech transcription, text normalization, transcript generation and candidate call log generation based on Real-time and Global Models. The contact center agent edits the candidate call log to create the final call log. The models are updated based on analysis of user feedback in the form of the editing of the candidate call log done by the contact center agents or supervisors. The pipeline yields a candidate call log which the agents can edit in less time than it would take them to generate a call log manually.
Owner:MICROSOFT TECH LICENSING LLC

System and Method for Automatically Generating Adaptive Interaction Logs from Customer Interaction Text

ActiveUS20100104087A1Reduce time they spend documentingFacilitates correct generationSpecial service for subscribersManual exchangesAdaptive interactionContact center
A system and method for providing an adaptive Interaction Logging functionality to help agents reduce the time spent documenting contact center interactions. In a preferred embodiment the system uses a pipeline comprising audio capture of a telephone conversation, automatic speech transcription, text normalization, transcript generation and candidate call log generation based on Real-time and Global Models. The contact center agent edits the candidate call log to create the final call log. The models are updated based on analysis of user feedback in the form of the editing of the candidate call log done by the contact center agents or supervisors. The pipeline yields a candidate call log which the agents can edit in less time than it would take them to generate a call log manually.
Owner:NUANCE COMM INC

Method and system for the automatic recognition of deceptive language

A system for identifying deception within a text includes a processor for receiving and processing a text file. The processor includes a deception indicator tag analyzer for inserting into the text file at least one deception indicator tag that identifies a potentially deceptive word or phrase within the text file, and an interpreter for interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the text file and generating deception likelihood data based upon the density or distribution of potentially deceptive word or phrases within the text file. A method for identifying deception within a text includes the steps of receiving a first text to be analyzed, normalizing the first text to produce a normalized text, inserting into the normalized text at least one part-of-speech tag that identifies a part of speech of a word associated with the part-of-speech tag, inserting into the normalized text at least one syntactic label that identifies a linguistic construction of one or more words associated with the syntactic label, inserting into the normalized text at least one deception indicator tag that identifies a potentially deceptive word or phrase within the normalized text, interpreting the at least one deception indicator tag to determine a distribution of potentially deceptive word or phrases within the normalized text, and generating deception likelihood data based upon the density or frequency of distribution of potentially deceptive word or phrases within the normalized text.
Owner:DECEPTION DISCOVERY TECH

Inverse Text Normalization

Embodiments are directed to efficient multilingual inverse text normalization (ITN) of text in spoken form to produce normalized text for display. Embodiments are directed to preprocessing the multilingual text into a language-independent representation, tokenizing text in spoken form, segmenting the tokenized text into ITN items by grouping consecutive words using an ITN lexicon, classifying the ITN items into ITN categories by using the ITN lexicon or tagged information from language model, applying one or more ITN rules that are selected based on the ITN categories into which ITN items have been classified to rewrite the ITN items; and post processing the ITN item and outputting inversely normalized text in written form for display. The ITN lexicon may include ITN lexicon entries that are each located within an ITN category in the ITN lexicon.
Owner:NOKIA CORP

System and Method for the Normalization of Text

A computer-implemented method of normalizing abbreviated text to substantially unabbreviated text, performed on at least one computer system comprising at least one processor, includes generating, based at least partially on data in at least one data resource comprising abbreviated text associated with unabbreviated text, a plurality of transformation functions in at least one order; transforming at least one string with at least one of the transformation functions, wherein the at least one string at least partially comprises abbreviated text; and determining if at least a portion of the at least one string has been at least partially transformed to substantially unabbreviated text. A system and a computer program product for implementing the aforementioned method includes appropriately communicatively connected hardware components.
Owner:AVAYA INC

Method and system for extracting a product and classifying text-based electronic documents

A system to automatically enhance, tag, classify, categorize, cluster and index products described in unstructured text-based electronic documents. The system and method incorporate the use of text normalization, regular expressions, product number matching rules, text segmentation, entity detection, language models, predictive modeling, hierarchal subspace clustering, formal concept analysis, and a weighted combination of all techniques to detect and infer knowledge extracted from a digital version of raw, unstructured product text. Knowledge extracted and inferred comprises knowledge units including: main conceptual entity, entity text patterns, product language models, and conceptual hierarchies. The extracted knowledge units are utilized to store and index products in a product knowledge database and the products and knowledge units are made available to users via a user interface.
Owner:ALQADAH FARIS

Log analysis system and log analysis method for security system

A log analysis system and method for a security system, which allow the security system monitoring communications between general systems to generate logs according to a predetermined rule and store the same in a log database are disclosed. A log analyzer determines whether log information containing attack content in the log database exists, and if log information containing attack content exists, sorts the log information by attack name. The log analyzer determines whether the attack content data of the log information sorted by attack name is based on a web request or not, and if the attack content data is based on a web request, performs HTTP-indicator-based text normalization. The log analyzer performs rule-pattern-based text normalization after the HTTP-indicator-based text normalization. According to an embodiment of the present invention, a quantitative basis for increasing an amount and accuracy of analysis and therefore improving accuracy of rules in the future can be established by making improvements to the conventional log analysis methods for security systems so that an operator or log analyst may discover a hacking attack in a timely manner.
Owner:KANG MYOUNG HUN

Method for obtaining question and answer pairs from unstructured text based on deep learning

The invention relates to a method for obtaining question and answer pairs from an unstructured text based on deep learning. The method comprises the following steps of performing text normalization processing; based on a deep neural network model, sentence classification and pairing and key phrase extraction are carried out; obtaining question and answer pairs in the text; crawling question and answer pairs outside the text; question and answer pair summary duplicate removal. According to the method, for the problem that question and answer pairs are difficult to obtain, the scale question andanswer pairs are automatically and efficiently obtained by effectively utilizing easily-obtained unstructured document resources in combination with the use of the deep neural network model for manual proofreading and supplementary use, so that the knowledge base construction cost is reduced, and the knowledge base construction speed is increased.
Owner:北京中科汇联科技股份有限公司

Apparatus and method for detecting characteristics of electronic mail message

The present invention enables accurate detection of risks from an electronic mail message. In a mail inspection unit, an information extraction section extracts text and a mail address from electronic mail accumulated in a journal DB, and a text normalization section normalizes the text. A sort-information saving section generates text sort information according to the score obtained from a sorting engine, and stores it in a mail-management-information storage section. A personal-information saving section extracts personal information from a personal-information storage section according to the mail address, and stores it in the mail-management-information storage section. Finally, a risk-level determination section compares the information stored in the mail-management-information storage section with the information stored in a category-information storage section to determine the risk level of the electronic mail.
Owner:IBM CORP

Face Normalization for Recognition and Enrollment

The present invention is an iterative method for normalization of a probe image against the Eigenspace learned from a database of images. The invention is also an iterative method for normalizing the n images in a database, wherein the normalization is carried out without using a predetermined criterion.
Owner:THE STATE OF ISRAEL MINIST OF AGRI & RURAL DEV AGRI RES ORG ARO VOLCANI CENT

Reliable search method base on content trust

The present invention relates to a reliable search method base on content trust, comprising the following steps: A, receiving search keywords of users through a user interaction module, and distributing the search keywords to each search engine providing original search service; B, receiving traditional search results provided by each search engine, and submitting the traditional search results to a content trust detection module; C, performing the operations such as the elimination of repetition, the text normalization, the comprehension of trust semantemes, the calculation of content credibility and the rearrangement of search results, submitting the reliable search results to the user interaction module; D, presenting the reliable search results through the user interaction module to the users. The reliability of the essence of text information, text content is evaluated in the invention, And three text-content-trust assessment methods are proposed base on trust facts, trust evidences and trust characteristics, and are unified by utilizing Bayes network. The credibility of the text contents is applied to a sorting algorithm, which can improve the precision of the search results.
Owner:TONGJI UNIV

Microblog text normalizing, word segmenting and part-speech tagging method and system

The invention relates to a microblog text normalizing, word segmenting and part-speech tagging method. The microblog text normalizing, word segmenting and part-speech tagging method comprises the steps that firstly, a tagged corpus is established, and tagged corpora in the tagged corpus is divided into a training set, a development set and a testing set; secondly, a microblog dictionary is established through SVM model training and learning; thirdly, through the training set, the development set and the microblog dictionary, a text normalizing, word segmenting and part-speech tagging combined model is formed through training and learning with a BeamSearch method; fourthly, through the combined model, text normalizing, word segmenting and part-speech tagging are conducted on a microblog text to be processed at the same time, and the performance of the combined model is tested. According to the method, a large number of microblogs with tagged sentences are used as the training corpus, a candidate result is expanded through the mciroblog dictionary, the established combined model can act on three tasks at the same time, the three tasks influence each other, so that the performance of each task is improved, and therefore the overall performance is improved.
Owner:北京牡丹电子集团有限责任公司数字科技中心

Method and device for extracting structural relationship of coronary artery medical report

The invention discloses a method and device for extracting the structural relationship of a coronary artery medical report. The method comprises the following steps: s1, obtaining a coronary artery report description text, and carrying out the data preprocessing of the obtained text; s2, correcting wrongly written characters of the pre-processed text, and performing normalization processing; s3, recognizing medical entities in the text according to the normalized text, and segmenting the text into a plurality of entity texts; s4, performing entity relationship extraction on the entity text according to the structured extraction rule to form a structured relationship path diagram; s5, verifying the structured relation path diagram and acquiring an output result of the coronary artery medical report structural relation. According to the method, the accuracy of entity recognition is improved through a wrong character correction and text normalization method; therefore, the high-accuracy entity relationship extraction is realized by constructing a chain structure between entities.
Owner:浙江卡易智慧医疗科技有限公司

Text normalization method and system based on WFST

The invention provides a text normalization method and system based on WFST. The method comprises the following steps: classifying non-Chinese characters according to a weighting finite state converter in advance, and writing corresponding conversion rules for the classified non-Chinese characters; identifying non-Chinese character strings from a target Chinese text according to the weighting finite state converter; calling matched target conversion rules according to the types of the identified non-Chinese character strings, and transferring the identified non-Chinese characters into Chinesecharacters according to the target conversion rules. The technical scheme can improve accuracy rate of transferring the non-Chinese characters into the Chinese characters.
Owner:BEIJING UNISOUND INFORMATION TECH

Microblog text normalization method based on context graph random walk and phonetic configuration codes

InactiveCN110032738AGet phonetic similarityConform to the expression characteristicsData processing applicationsNatural language data processingChinese charactersMicroblogging
The invention provides a microblog text normalization method based on context graph random walk and phonetic configuration codes, and belongs to the technical field of computer technology social mediatext content analysis and mining. The method comprises the following steps: identifying non-standard words, and extracting word contexts; constructing a context graph for random walk to obtain a standardized candidate set based on context; obtaining a standardized candidate set based on phonetic configuration by using the phonetic configuration codes of the Chinese characters; and processing thetwo standardized candidate sets to obtain a final standardized result. The method overcomes the defect that Chinese character pronunciation is not fully considered in a traditional method. In essence,the social media is different from written languages such as news and the like and is full of a large number of non-standard abbreviations, homophones and homomorphic words, so that the effect of processing the microblog text by a natural language processing tool is not ideal. Therefore, the invention provides a microblog text normalization method which combines phonetic configuration codes withpredecessor and postdecessor understanding, thereby providing possibility for utilizing a natural language processing tool to analyze and mine after normalization.
Owner:中森云链(成都)科技有限责任公司

Text normalization method, device and equipment and storage medium

The invention provides a text normalization method, device and equipment and a storage medium, and the method comprises the steps: obtaining a to-be-normalized text; extracting text regularization features from the to-be-normalized text, and the text regularization features at least comprising semantic features capable of representing semantics of the to-be-normalized text and generalization features capable of representing repeated parts in the to-be-normalized text; and determining a normalized text corresponding to the to-be- normalized text by utilizing the text normalized features and a pre-established text normalized model. According to the text normalization method provided by the invention, the to-be-normalized text can be normalized into the text with clear sentence meaning and relatively strong readability and logicality by utilizing the text normalization characteristics of the to-be-normalized text and the pre-established text normalization model.
Owner:IFLYTEK CO LTD

Single-character text normalization model training method and device, and single-character text recognition method and device

The invention relates to a single-character text normalization model training method and device, and a single-character text recognition method and device. The model training method comprises the following steps: acquiring a plurality of single-character sample pictures; normalizing the single-character sample pictures to obtain standard character pictures corresponding to the single-character sample pictures; generating a training data set according to the plurality of single-character sample pictures and standard character pictures in one-to-one correspondence with the plurality of single-character sample pictures; and training a deep learning neural network by using the training data set and a mean square loss function to obtain a single-character text normalization model. The trainingdata set used in training is composed of original data and the standard character pictures which are obtained through normalization processing and have a unified style, so that in the process of training the model, the training and convergence of the model can be accelerated, the model can better learn the essential characteristics of various input texts, and the recognition precision of the modelis further improved.
Owner:上海眼控科技股份有限公司

Mammography report semantic tree model building method supporting heterogeneous information integration

The invention relates to a mammography report semantic tree model building method supporting heterogeneous information integration. The mammography report semantic tree model building method supporting heterogeneous information integration is characterized by comprising the following steps: forming a text normalization database of breast cancer molybdenum target image expression text description;obtaining text description expressed by the breast cancer molybdenum target image in real time, and performing phrase division on the text description based on the text normalization database according to semantic information; obtaining semantic constraints of the entity; and forming a semantic tree of text description. According to the mammography report semantic tree model building method supporting heterogeneous information integration, structuralization of text information of complex breast cancer molybdenum target images from different hospitals and different doctors can be achieved by constructing a mammography semantic tree, and semantic-based integration of heterogeneous information is realized.
Owner:DONGHUA UNIV

Text recognition method and device, computer equipment and storage medium

The invention relates to a text recognition method and device, computer equipment and a storage medium. The method comprises the steps of obtaining a to-be-recognized text image; carrying out text positioning on the to-be-identified text image; obtaining a corresponding text line area; according to the number of characters in the text line area and a preset size rule, preprocessing the text line area; detecting the preprocessed text line area through the text normalization model to obtain the text line picture in the standard format corresponding to the text line area, so as to enable the input for text recognition to have a unified style, recognizing the text line picture in the standard format by directly adopting the first deep learning neural network, and obtaining the target characterstring. Since only the normalized text line pictures in the standard format need to be paid attention to during text recognition, the features of the normalized text line pictures are more regular and single, the number of features needing to be learned by the network is smaller, network training is easier, the performance is better, and the recognition precision is higher.
Owner:上海眼控科技股份有限公司

Text normalizing method and device, electronic equipment and storage medium

PendingCN111832248AImprove accuracyImprove the efficiency of regularizationSemantic analysisAlgorithmText entry
The embodiment of the invention provides a text regularization method and device, electronic equipment and a storage medium. The method comprises the steps that a to-be-regularized text is determined;inputting the to-be-structured text into the text structured model to obtain a structured text corresponding to the to-be-structured text output by the text structured model; wherein the text regularization model is obtained by training based on a to-be-regularized sample text, a regularized sample text and a sample editing type of each segmented word in the to-be-regularized sample text; whereinthe text regularization model is used for determining an editing type of each segmented word in the to-be-regularized text, determining a regularization mode of the to-be-regularized text based on whether the to-be-regularized text contains insertion segmented words of which the editing types are insertion types or not, and regularizing the to-be-regularized text based on the regularization mode.According to the method, the device, the electronic equipment and the storage medium provided by the embodiment of the invention, the text normalization accuracy and the text normalization efficiencyare improved.
Owner:UNIV OF SCI & TECH OF CHINA +1
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products