Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

265 results about "Text segmentation" patented technology

Text segmentation is the process of dividing written text into meaningful units, such as words, sentences, or topics. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. The problem is non-trivial, because while some written languages have explicit word boundary markers, such as the word spaces of written English and the distinctive initial, medial and final letter shapes of Arabic, such signals are sometimes ambiguous and not present in all written languages.

Mapping defects or dirt dynamically affecting an image acquisition device

Defects such as dirt, dust, scratches, blemishes, pits, or defective elements or pixels in a CCD, scanner, photocopier, or image acquiring device are dynamically detected by processing a plurality of images via a computer. A pristine object of calibration is not required. Stationary components of the video images are found and detected so as to produce a low false alarm probability. Text segmentation and measurement of total deviation based on variability related to high-frequency components of the video image are employed to prevent applying the process or method to printed text or graphics. Additional techniques optionally employed are median filtering, sample area detection, and dynamic adjustment of scores. In special cases, only moderately blank documents are used. The dynamic defect detection allows defect compensation, defect correction, and alerting the operator of defects.
Owner:FOTONATION LTD

Methods and systems for selecting a language for text segmentation

Methods and systems for selecting a language for text segmentation are disclosed. In one embodiment, at least a first candidate language and a second candidate language associated with a string of characters are identified, at least a first segmented result associated with the first candidate language and a second segmented result associated with the second candidate language are determined, a first frequency of occurrence for the first segmented result and a second frequency of occurrence for the second segmented result are determined, and an operable language is identified from the first candidate language and the second candidate language based at least in part on the first frequency of occurrence and the second frequency of occurrence.
Owner:GOOGLE LLC

Text segmentation with multiple granularity levels

Text processing includes: segmenting received text based on a lexicon of smallest semantic units to obtain medium-grained segmentation results; merging the medium-grained segmentation results to obtain coarse-grained segmentation results, the coarse-grained segmentation results having coarser granularity than the medium-grained segmentation results; looking up in the lexicon of smallest semantic units respective search elements that correspond to segments in the medium-grained segmentation results; and forming fine-grained segmentation results based on the respective search elements, the fine-grained segmentation results having finer granularity than the medium-grained segmentation results.
Owner:ALIBABA GRP HLDG LTD

Text structuring method aiming at power grid fault cases

ActiveCN107992597ASpecific descriptionSpecific attribute status quantityData processing applicationsNatural language data processingNamed-entity recognitionPower grid
The invention discloses a text structuring method aiming at power grid fault cases. The method includes: subjecting unstructured texts to named entity recognition, and constructing an entity dictionary aiming at the field of power grid to assist in entity recognition and text segmentation; extracting attribute values and attribute description state quantities, wherein the state quantities includedigital state quantities and non-digital state quantities according to types; extracting and matching digital state quantity modification attributes according to a rule-based method; detailing the non-digital state quantities to divide into phrase form based state quantities and sentence form based state quantities, and extracting respective modification attributes; according to recognized attributes and corresponding state quantities, finally generating a plurality of binary groups formed by the attributes and the corresponding state quantities to complete text structuring.
Owner:ELECTRIC POWER RESEARCH INSTITUTE OF STATE GRID SHANDONG ELECTRIC POWER COMPANY +2

System and method for automatic segmentation of ASR transcripts

Text segmentation based on topic boundary detection has been an industry problem in automating information dissemination to targeted users. A system for automatic segmentation of ASR output text involves boundary identification based on “topic” changes. The proposed approach is based on building a weighted graph to determine dependency in input sentences based on bi-directional analysis of the input sentences. Furthermore, the input sentences are segmented based on the notion of segment cohesiveness and the segmented sentences are merged based on preamble and postamble analyses.
Owner:TECH MAHINDRA INDIA

Classified content auditing terminal system

The invention relates to a classification system of video and audio content the examination in the terminals. The system, the program source management module preservation program and the source address information; Speech Recognition module on the input audio streams speech recognition; Content analysis module extracts the relevant information and video subtitles information; Subtitles identification module subtitles information will be entered into the text segmentation, recognition; Die Face Recognition block on the importation of video information, facial feature extraction information, face recognition; fusion scoring modules under voice recognition, face recognition subtitles identification and the results of programs points, scoring results from the judgment whether legitimate programs, the decision on the program is immediately blocked or as suspicious file upload Jin to further examinations; strategy server cache module from the audited download updated strategy; decoder module for real-time video and audio decoder paper.
Owner:北京新岸线网络技术有限公司

Information processing method and device

The embodiment of the invention discloses an information processing method and device. The method comprises the steps that an HTML document set which is obtained in advance is analyzed, and text data sets contained in the HTML document set are extracted; word segmentation is conducted on the text data sets, and a text segmentation table is obtained; word frequency analysis is conducted on all words in the text segmentation table, and a text vector space matrix is constructed; discrete point text vectors in the text vector space matrix are eliminated, and a text similarity matrix of all text vectors in the text vector space matrix without the discrete point text vectors is obtained; according to the text similarity matrix, topic cluster is conducted on the text data set. By means of the method, a word list can be accurately constructed, topic cluster is conducted after the discrete point text vectors are eliminated, the topic cluster speed is increased, and the topic cluster accuracy is improved.
Owner:中国联合网络通信有限公司广东省分公司 +1

Speech recognition text segmentation method and device

The present invention discloses a speech recognition text segmentation method and device. The method comprises: performing endpoint detection on speech data, to obtain speech segments and a starting frame serial number and an ending frame serial number of each speech segment; performing speech recognition on each speech segment, to obtain a recognition text corresponding to each speech segment; extracting a segmentation feature of the recognition text corresponding to each speech segment; by using the extracted segmentation feature and a pre-established segmentation model, performing segmented detection on the recognition text corresponding to the voice data, to determine a position where segmentation is needed; and segmenting the recognition text corresponding to the speech data according to a segmented detection result. According to the method and apparatus disclosed by the present invention, the recognition text can be segmented automatically, so that the structure of the recognition text can be clearer.
Owner:IFLYTEK CO LTD

Text semantic analysis method

A text semantic analysis method and system can realize semantic analysis of text data base on lexical level and sentence level. Aiming at the semantic analysis at the lexical level, the invention firstly adopts an improved word segmentation algorithm to solve the problem that English words are segmented only by spaces. Secondly, based on word segmentation, TF-IDF modeling is performed to obtain weight value; Then the text is vectorized by weighting and summing the weight value and the word vector trained by Word2Vec, and finally the document similarity is solved. At the same time, the invention considers the contribution degree of the vocabulary to the document content and the semantic status to calculate the similarity degree of the document, the result has higher accuracy, and provide agood foundation for subsequent text clustering. The present invention extracts subject-predicate object structure based on text segmentation, part-of-speech tagging, syntactic analysis and dependencyrelation for sentence level semantic analysis. The invention realizes the extraction of subject-predicate-object structures of various sentence types in all aspects, and realizes the noun expansion function, which is more consistent with the manual extraction result.
Owner:BEIJING UNIV OF TECH

Multi-document summarization method based on text segmentation

The invention belongs to the technical field of multi-document summarization and provides a multi-document summarization method based on text segmentation, which comprises the following steps of: using HowNet to obtain a concept, building a concept vector space model, conducting text segmentation by adopting an improved DotPlotting model and a sentence concept vector space, calculating sentence weight by using the built concept vector space model, generating a summary according to the sentence weight, the text segmentation and the similarity situation, and evaluating the generated summary by using the ROUGE-N evaluation method and using F_Score as an evaluation index. According to the result, the multi-document summarization by using a text segmentation technique is effective, relevant documents provided by users can be gathered to form a summary by adopting the multi-document summarization method, the summary is displayed to the users in a proper way, the information acquisition efficiency is greatly improved, the practicability is high and the popularization and application values are greater.
Owner:广西超宏科技有限公司

Method and device for generating recommended delivery place name

The invention discloses a method for generating a recommended delivery place name. The method mainly comprises the steps of performing text segmentation for a known address, matching segmented words, checking in a database, calculating the matching frequency, and resetting frequency threshold. The invention also discloses a device for generating the recommended delivery place name. With the adoption of the method and device, the recommended delivery place name can be automatically, timely, quickly and accurately generated.
Owner:西安京迅递供应链科技有限公司

Method and system for extracting a product and classifying text-based electronic documents

A system to automatically enhance, tag, classify, categorize, cluster and index products described in unstructured text-based electronic documents. The system and method incorporate the use of text normalization, regular expressions, product number matching rules, text segmentation, entity detection, language models, predictive modeling, hierarchal subspace clustering, formal concept analysis, and a weighted combination of all techniques to detect and infer knowledge extracted from a digital version of raw, unstructured product text. Knowledge extracted and inferred comprises knowledge units including: main conceptual entity, entity text patterns, product language models, and conceptual hierarchies. The extracted knowledge units are utilized to store and index products in a product knowledge database and the products and knowledge units are made available to users via a user interface.
Owner:ALQADAH FARIS

System and method for automatic segmentation of ASR transcripts

Text segmentation based on topic boundary detection has been an industry problem in automating information dissemination to targeted users. A system for automatic segmentation of ASR output text involves boundary identification based on “topic” changes. The proposed approach is based on building a weighted graph to determine dependency in input sentences based on bi-directional analysis of the input sentences. Furthermore, the input sentences are segmented based on the notion of segment cohesiveness and the segmented sentences are merged based on preamble and postamble analyses.
Owner:TECH MAHINDRA INDIA

Method and device for classification of mail

InactiveCN103136266AClassification is efficient and accurateSolutionSpecial data processing applicationsData miningConditional probability
The invention discloses a method and a device for classification of mails. The method for the classification of the mails includes the steps: enabling the mails to be classified to be through text segmentation to get an entry set; matching entries in the entry set with feature words which represents categories of the mails in a feature word bank, and calculating conditional probability of the categories which the mails belong to according to a matching result; and confirming the categories of the mails according to the conditional probability. The method for the classification of the mails resolves the problems that in the prior art, the method for classification of the mails is less in quantity and low in accuracy, accordingly the effects of high-efficiency accurate classification of the mails and filtering of junk mails are achieved, and performance of a system is improved and user experience is also improved.
Owner:ZTE CORP

Text big data based digitized emergency management case library establishment method and device

ActiveCN106202561AAutomatically realize the secondary classificationConvenient researchSpecial data processing applicationsEvent recognitionComputer science
The invention relates to a text big data based digitized emergency management case library establishment method and device. The method comprises the steps that data is acquired at regular intervals, and the acquired data is preprocessed to obtain Chinese text segmentation results; based on the Chinese text segmentation results, identification of emergency related data is achieved through data cleaning according to a set emergency field keyword vocabulary to obtain emergency classification results; based on the emergency classification results, identification and tracking of special emergencies are performed; structuralized information description is conducted on special data about emergency identification and tracking by utilizing an information extraction method to obtain various emergency case libraries. The emergency related data can be automatically acquired, and secondary emergency classification can be automatically achieved. The special emergency identification and tracking can be automatically performed, and analysis and information extraction conducted on designated emergency related data are represented through cases to form emergency case representation.
Owner:北京联创众升科技有限公司

Multimedia transliteration method and system

The invention provides a multimedia transliteration method is applied to a multimedia transliteration system and comprises the following steps of: S1, receiving a demonstration manuscript, and constructing a key information tree of the demonstration manuscript; S2, receiving voice data, carrying out voice identification on the voice data, and obtaining transliteration texts of the voice data; S3, synchronizing the voice data and the transliteration texts with the demonstration manuscript by means of the key information tree; and S4, displaying the demonstration manuscript with the synchronized voice data and transliteration texts to a user. The user can hear the voices of a speaker and see the texts transliterated by the voices of the speaker while seeing the demonstration manuscript, and furthermore, the transliteration texts are segmented according to sub-themes included in each page of the transliteration texts, the transliteration texts of the same sub-theme is in one segment, and the transliteration texts of different sub-theme serve as different segments, so that the user can conveniently understand the transliteration texts, and the experience of the user is further improved.
Owner:ANHUI IFLYTEK INTELLIGENT SYST

Systems and methods for hybrid text summarization

Techniques are provided for segmenting text into categorized discourse constituents and attaching discourse constituents into a structural representation of discourse. Techniques for determining hybrid structural and non-structural summaries of a text are also provided. A text is segmented based on a theory of discourse analysis into at least a main discourse constituent containing spatio-temporal information about a single event in a possible world view. The discourse constituents are then inserted into a structural representation of discourse. Non-structural techniques are used to determine relevance scores and important discourse constituents are determined. Relevance scores are percolated through the structural representation of discourse to determine supporting preceding discourse constituents that preserve grammaticality. A hybrid text summary is then determined based on the structural representation of the discourse and relevance scores.
Owner:FUJIFILM BUSINESS INNOVATION CORP

Log clustering method based on graph structure

The invention provides a log clustering method based on a graph structure. The method comprises the following steps of clustering logs based on text segmentation, vector similarity and a maximum connected sub-graph in order to obtain a feature library; and carrying out class labelling on the massive logs according to the class features in the feature library. The method can automatically recognize the most appropriate class number in the massive logs without manually assigning the clustering number; in addition, the method can classify the logs precisely to lay a foundation for mining of massive log data.
Owner:NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT

Scene text segmentation method based on weak supervision deep learning

The invention provides a scene text segmentation method based on weak supervision deep learning, and the method comprises the following steps: enabling a scene picture to be superposed with any text to generate a scene text picture, generating a training sample, and labeling the training sample as the scene picture; performing feature extraction by using a convolutional neural network to obtain high-level semantics step by step; performing up-sampling through deconvolution to gradually recover the high-level semantic feature map to the size of the input image; carrying out multi-scale fusion on the feature maps output by the convolution layer and the deconvolution layer; activating the fused feature map to obtain a dichotomy black-and-white image of the scene and the text; setting a loss function for training; and corroding and expanding the scene text segmentation map obtained after training to generate a text region bounding box. The method does not need any strongly supervised pixel-level annotation sample, simply and efficiently solves the text segmentation problem in scene text detection, greatly reduces the algorithm cost, and improves the scene text segmentation efficiency.
Owner:UNIV OF ELECTRONICS SCI & TECH OF CHINA

System and method for enhancing comprehension and readability of text

The present invention is a text display system with speech output that uses a method of text segmentation in which segments of text are presented one after another for reading text sequentially. To indicate the location of text a user is currently reading, the current sentence is emphasized by presenting the surrounding text in faded colors. The current sentence is segmented into phrases where the points of segmentation are chosen by a series of grammatical rules and the desired number of words in each segment. When the text is presented sequentially, each segment is highlighted within the current sentence. With the use of a text-to-speech output system, each segment is spoken out with a pause before the next segment is presented. In a non-linear / selective reading scenario, a user can select a text segment, for which the span of the segment can be automatically generated or manually selected by the user.
Owner:QUILLSOFT

Data retrieval method and device

The embodiment of the invention discloses a data retrieval method and device and relates to the technical field of communication.The data retrieval method and device are used for improving retrieval result accuracy and retrieval efficiency.According to the specific scheme, the data retrieval method comprises the steps that a call center server receives a voice message sent by a user terminal, and a text message obtained by carrying out text conversion on the voice message is obtained; a first segmentation set recognized by carrying out text analysis on the text message is obtained, wherein the first segmentation set comprises at least one text segmentation; a keyword index list of retrieval data stored in a knowledge database is searched for retrieval keywords matched with the text segmentations in the first segmentation set, wherein the keyword index list contains at least one keyword index entry, and the keyword index entry contains retrieval keywords and identities of retrieval data corresponding to the retrieval keywords; the retrieval data indicated by the identifies of the retrieval data corresponding to the found retrieval keywords is retrieved in the knowledge database.The data retrieval method and device are used in the data retrieval process.
Owner:HUAWEI SOFTWARE TECH

Information retrieval method, device and equipment and computer readable storage medium

The embodiment of the invention provides an information retrieval method, device and equipment and a computer readable storage medium, and the method comprises the steps: carrying out the text segmentation of to-be-retrieved information in a received information retrieval request, and obtaining at least two fields; obtaining a feature vector of the to-be-retrieved information and a sub-feature vector of each field; in a preset full text space, performing first clustering processing on texts in a preset text library according to the feature vectors to obtain a first number of candidate texts; performing second clustering processing on the first number of candidate texts according to the sub-feature vectors in a preset sub-text space to obtain a second number of recalled texts; taking the recalled text as a retrieval result of the information retrieval request, and outputting the retrieval result. Through the embodiment of the invention, the similarity between the to-be-retrieved information and the recalled text can be flexibly measured according to the semantic relevancy of the text, and the retrieval accuracy of an information retrieval system is improved.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Character recognition method and apparatus

The invention relates to a character recognition method and apparatus. The method comprises: obtaining an input text image; performing text line segmentation on the text image to obtain a text line region of the text image; performing character region segmentation on the text line region according to text character attributes to obtain character region information; and according to the character region information, performing single character segmentation in combination with the text image to obtain a character segmentation result. According to the character recognition method and apparatus, the text segmentation can be accurately performed, so that the recognition performance of OCR (Optical Character Recognition) is greatly improved; and the scheme has relatively high practical values in various text recognition applications.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Semantic comprehension system and method oriented to Chinese text

InactiveCN107577662AResolve unmeasured wordsSolve the relationship between wordsNeural architecturesSpecial data processing applicationsPart of speechCurse of dimensionality
The invention provides a semantic comprehension system and method oriented to a Chinese text. Based on deep learning, a deep learning text classification model is provided; the model is divided into an input layer, a convolutional layer, a pooling layer, a GRU (Gated Recurrent Unit) layer, a fully connected layer and an output layer; a pinyin characteristic sequence of text segmentation is used asinput; characteristics are obtained through multi-layer characteristic extraction; and an intention category is predicted to obtain a text classification result. According to the semantic comprehension system and method oriented to the Chinese text, the part of speech of a statement does not need to be judged, a complex preprocessing process such as a syntax analysis tree and the like does not need to be generated, the text only needs to be segmented and the segmented text is converted into pinyin, and the problems that the relation between words and words cannot be measured, a lot of external prior knowledge is needed and the curse of dimensionality is easily generated when large-scale corpuses are processed in a conventional characteristic extraction method are solved.
Owner:SHANGHAI JIAO TONG UNIV +1

Unsupervised automatic extraction method of microblog new words based on repeated word strings

The invention discloses an unsuspervised automatic extraction method of microblog new words based on repeated word strings. The method includes the steps that firstly, text segmentation is conducted on microblog documents to be processed, texts are segmented through a dynamic programming word segmentation method, the word strings to be recognized are segmented, and word segmentation fragments in the word strings to be recognized are combined into the new words to be recognized; candidate new words are extracted from the word strings to be recognized according to a statistic word selection model, and then the candidate words are filtered through a rule filtering model, and eventually the final new words are acquired. The method has the advantages that the high accuracy rate is effectively guaranteed, the method does not depend on a rule word stock too much, and the extraction speed of the new words is guaranteed.
Owner:HEFEI UNIV OF TECH

Character segmentation by slices

A method for segmentation of characters in text that segments text into lines, words and slices and determines at least one of fixed pitch and proportional pitch prior to segmentation. The method computes histograms of the lines and defines widths of lobes of the histograms of the lines as the character pitches. In addition, the method further analyzes the character pitches; segments lines into words; computes histograms of the words and aggregating the histograms of the words at predetermined points. Moreover, the method segments the words; slicing them words into an upper slice and lower slice and further segments the upper slice and the lower slice. The results are then combined to provide for both coarse and fine segmentation that enhance the performance of character OCR for documents scanned as at least one of gray-scale images and color images.
Owner:IBM CORP

Hot-spot event information processing method and apparatus

Embodiments of the present invention disclose a hot-spot event information processing method. The method comprises: performing text segmentation processing on a plurality of text information to obtaina word segmentation contained in each text information; according to statistical data of each word segmentation obtained by the processing, extracting at least one keyword from the word segmentationcontained in each text information; according to the network traffic data of the text information corresponding to each keyword, obtaining a traffic heat value of each keyword; determining the keywordwhose traffic heat value is higher than the first preset threshold as a hot word; and according to the degree of correlation between various hot words, performing hot word clustering on the determined hot words to obtain at least one event hot word cluster. The embodiments of the present invention also disclose a hot-spot event information processing apparatus. According to the technical scheme of the present invention, hot-spot events in the network can be timely and accurately grasped.
Owner:TENCENT TECH (SHENZHEN) CO LTD

Hybrid text segmentation using n-grams and lexical information

A hybrid n-gram / lexical analysis tokenization system including a lexicon and a hybrid tokenizer operative to perform both N-gram tokenization of a text and lexical analysis tokenization of a text using the lexican, and to construct either of an index and a classifier from the results of both of the N-gram tokenization and the lexical analysis tokenization, where the hybrid tokenizer is implemented in at least one of computer hardware and computer software and is embodied within a computer-readable medium.
Owner:IBM CORP
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products