Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

66 results about "Suffix tree" patented technology

In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations. The construction of such a tree for the string S takes time and space linear in the length of S. Once constructed, several operations can be performed quickly, for instance locating a substring in S, locating a substring if a certain number of mistakes are allowed, locating matches for a regular expression pattern etc.

Index structure for supporting structural XML queries

The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘ / / ’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
Owner:IBM CORP

Method for extracting name entities and jargon terms using a suffix tree data structure

A method for entity name and jargon term recognition and extraction. An embodiment of the present invention uses a suffix tree data structure to determine frequently occurring phrases. In one embodiment text to be analyzed is preprocessed. The text is then separated into clauses and a suffix tree is created for the text. The suffix tree is used to determine repetitious segments. Unrecognized text fragment, occurring with a high frequency, have a comparably high probability of being a name entity or jargon term. The set of repetitious segments is then filtered to obtain a set of possible entity names and jargon terms.
Owner:INTEL CORP

Table boundary detection in data blocks for compression

Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information.
Owner:IBM CORP

Intelligent prompt method, module and system for search

The invention discloses an intelligent prompt method, an intelligent prompt module and an intelligent prompt system for search. According to the method disclosed by the invention, a server executes the following steps of distinguishing prefix words and suffix words by a tokenizer; carrying out synonymy expansion to form a prefix synonym list and a suffix synonym list; then traversing a hot word suffix tree to search hot words of prefix matches and / or suffix matches to obtain candidate words; and analyzing and calculating probability of each candidate word by historical search behaviors of a user. According to the method, a client executes the following steps of calculating load relevance of each candidate word; and calculating a click-on predicted value of each candidate word and then selecting the candidate words to display according to the click-on predicted values. In the invention, prompt words are obtained by matching between the prefix words and the suffix words, synonyms are combined, mass of search intentions of the user are integrated and the local relevance is combined, so that the prompt words are more approximate to the search intentions of the user.
Owner:JIANGSU WISEDU INFORMATION TECH

Method and apparatus for indexing suffix tree in social network

A method for indexing a suffix tree in a social network includes: scanning an input string and dividing the string into partitions each having a common prefix; performing no-merge suffix tree indexing on the divided partitions; storing information on the partitions on which no-merge suffix tree indexing is performed; storing suffix nodes of the no-merge suffix tree; and establishing a prefix tree. The performing no-merge suffix tree indexing includes: generating a set of suffixes having the common prefix in the input string; generating a suffix set from the set of suffixes and storing the suffix set; and building the suffix set as a sub-tree.
Owner:ELECTRONICS & TELECOMM RES INST

Biological sequence local comparison method capable of obtaining complete solution

Disclosed is a biological sequence local comparison method capable of obtaining a complete solution. The method includes adopting one biological sequence as a reference sequence and another biological sequence as a query sequence and setting a match score as Sa, a mismatch score as Sb, a gap opening penalty as Sg, a gap extension penalty as Ss and a fraction threshold as H; comparing suffix tree branches of the reference sequence with the query sequence; integrating comparison score results of each branch and taking a maximum score as a final comparison score result of the two biological sequences; and according to the final comparison score result, searching fragments provided with similar functions in the query sequence and the reference sequence or determining a homology relation between the query sequence and the reference sequence. According to the method, a Burrows-Wheeler transform (BWT) index is adopted, filtering and reuse technologies are combined to perform the comparison of the suffix tree branches of the reference sequence with the query sequence so as to obtain the complete solution for the comparison of the biological sequences, and the problems of insufficient accuracy and low efficiency in the prior art are solved.
Owner:NORTHEASTERN UNIV

Index structure for supporting structural XML queries

The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘ / / ’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).
Owner:IBM CORP

Suffix tree based catalog organizing method in distributed file system

The invention provides a suffix tree based catalog organizing method in a distributed file system. The method comprises the following steps of: grouping catalog items according to names, and storing different groups of catalog items on different discs on a storage server; and organizing and storing different groups of catalog items by adopting a suffix tree method.
Owner:DAWNING INFORMATION IND BEIJING +1

Out-of-order data packet string matching method and system

The invention relates to an out-of-order data packet string matching method and system. The out-of-order data packet string matching method comprises the following steps of initializing and determining a finite state automata DFA and a mode suffix tree PST; initializing a buffering area and receiving character strings transmitted in network and obtained through data flows one by one, wherein every data flow is formed by at least two character strings orderly; obtaining character strings belonging to the same data flow one by one; setting and determining a current state of the finite state automata if the current character string has a prefix; adding a finding state to the tail of the current character string and obtaining a combined fragment if the current character string has the suffix; inputting the combined fragment to the finite state automata; storing the current character string information and enabling the current character string to pass. According to the out-of-order data packet string matching method, the model does not need caching of the data package but only caches states and accordingly matching of the character string with out-of-order data package is achieved.
Owner:INST OF INFORMATION ENG CAS

User abnormal behavior detection method and system

The invention provides a user abnormal behavior detection method and system, and the method comprises the steps: obtaining historical transaction data, carrying out the training through a Markov modeland a probability suffix tree model according to the historical transaction data, and obtaining a Markov transition probability matrix and a probability suffix tree transition probability matrix; combining the Markov transition probability matrix and the probability suffix tree transition probability matrix through a linear weighted fusion method to obtain a fraud early warning transition probability matrix, and obtaining a fraud transaction early warning model according to the fraud early warning transition probability matrix; identifying the historical transaction data through a preset critical value and the fraud transaction early warning model, and adjusting the preset critical value according to an identification result to obtain a final critical value; and identifying to-be-detectedtransaction data according to the final critical value and the fraud transaction early warning model to obtain a detection result.
Owner:INDUSTRIAL AND COMMERCIAL BANK OF CHINA

System and method of discovering, detecting and classifying alarm patterns for electrophysiological monitoring systems

A system and method for electrophysiological monitoring system including a plurality of sensors configured to detect one or more health parameters of a patient and a monitoring device configured to receive a plurality of sensing signals from the sensors and output a monitoring signal representative of an alarm sequence, wherein the alarm sequence comprises a set of alarm events identified in the sensing signals. The system also includes an on-line monitoring module configured to generate a suffix tree data structure in response to the monitoring signal to identify alarm patterns from the set of alarm events and classify the alarm sequence in response to the occurrences of alarm patterns in the alarm sequence. The on-line monitoring module is further configured to alert monitoring personnel of an alarm condition after processing the alarm sequence in real-time.
Owner:GENERAL ELECTRIC CO

Language model training method, query method and corresponding device

The invention provides a language model training method, a query method and a corresponding device; the training method comprises the following steps: partitioning training corpus to obtain N groups of training corpus, wherein the N is a positive integer bigger than 1; carrying out parallel execution to the N groups of training corpus obtained by partition; ordering recursion suffix trees so as to respectively obtain ordering results reflecting inverted order position conditions of each word in each sentence; based on the ordering result, respectively setting up an n-ary word order tree according to a preset first word order structure under a condition that a second last word of each sentence is regarded as a root node, and the n refers to the preset one or more positive integers bigger than 1; combining the word order trees of the same root node and converting the word order so as to obtain a Trie tree storing forward probability information. A word order sequence of the Trie tree from root to leaf is as the following order: the second last word in the sentence, a last work, and other words arranged in an inverted order. By employing the method and device, the language model can be fast updated.
Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Character string matching method based on automatic control (AC) automatic machine and suffix tree

The invention discloses a character string matching method based on an automatic control (AC) automatic machine and a suffix tree, which comprises the following steps of: S1, compiling a characteristic character string into an AC automatic machine; S2, gathering suffixes of the characteristic character string and compiling into a suffix tree; S3, as long as a data packet enters into network security equipment, matching the data packet depending on the AC automatic machine, and conserving a matching state through the suffix tree; and S4, if the matching is successful, discarding the data packet. According to the character string matching method disclosed by the invention, the state numbers of the AC automatic machine and the suffix tree are conserved while matching the character string of the data packet, so that the data packet can be matched in a manner of continuing the last state even though disorder occurs, to avoid cache of the previous data packet; the shortcomings of increment of delay, deterioration of memory consumption and local reduction of a high-speed cache memory due to the cache are overcome, resource required by the network security equipment is reduced and performance of the network security equipment is improved.
Owner:TSINGHUA UNIV

A method for detecting outliers in time series

The technical proposal of the invention discloses a method for detecting abnormal points in time series, comprising the following steps: S1, discretizing an original time series and obtaining a symbolstring; 2, marking that data in the symbol string to form a symbolic train data set; S3, constructing a probability suffix tree according to the symbolized training data set; S4: detecting an abnormal point in the data sequence to be detected according to the probability suffix tree. A method for detecting time series anomaly point in that technical proposal of the invention can find out the anomaly mode which deviates from the conventional mode, can reveal the hidden information of the data more accurately and solve many practical problems. Time series can be expressed as probability suffixtree after being converted into symbol string by discretization processing. the emethod is more concise and more efficient to calculate the probability of suffix symbol of different symbol string, andthe recall ratio is high, and the detection effect is good.
Owner:中国民用航空上海航空器适航审定中心

Search algorithm for Chinese word segmentation

The invention belongs to the technical field of text search engines and specifically relates to a search algorithm for Chinese word segmentation. The algorithm is mainly divided into two phases including an offline indexing phase and an online searching phase. In the offline indexing phase, firstly suffix string sets of all original string sets are extracted, and then an improved suffix tree is generated by the suffix string sets. In the online searching phase, firstly query results of a keyword are obtained according to an index model based on the suffix tree, then a matching degree between the keyword and the query result is quantified, and finally, the query results are sorted from high to low according to a matching program followed by return. According to the search algorithm, an index construction time and an occupation space are balanced through an improved index structure based on the suffix tree, thus the search efficiency of the index structure with the search algorithm is much higher than the efficiency of violently calculating the matching degree and sorting efficiency of a result set.
Owner:FUDAN UNIV

Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)

InactiveCN107844731AEfficient detection methodAnswering questions about unusual data pointsCharacter and pattern recognitionData setAlgorithm
The invention belongs to the field of anomaly detection of time series data, and relates to a long-symbol-string anomaly-point detection method based on a probabilistic suffix tree (PST). According tothe method, discretization technology of continuous data and a probabilistic suffix tree model are utilized to detect long-time-series anomaly data points, and the steps thereof include: discretizingthe originally continuous long time series data to obtain a long symbol string, constructing the probabilistic suffix tree according to a symbolized training data set, utilizing the constructed PST to detect the delta-anomaly-points in a to-be-detected data set, and utilizing F<1>-Measure to evaluate a detection effect. Experimental results show that the method can effectively support various long time series, is higher in all of a recall rate, an accuracy rate and a precision rate, is good in the detection effect, and can be applied to various fields of aerospace, medical data analysis, financial data analysis, network anomaly behavior detection and the like.
Owner:FUDAN UNIV

A suffix tree-based code file cloning detection method

The invention relates to a suffix tree-based code file cloning detection method which can build suffix trees for engineering project files and achieve code file cloning detection in linear time. An LP detection scheme and algorithm is characterized in that content of source code files of computer software is used as granularity, and by performing lexical analysis and filtering on the code files and obtaining fingerprint values through MD5 hash, fingerprints are created and a fingerprint database is built. The fingerprint database is stored in a MySQL database, and the id of an open source project where the fingerprints are located is used as an index. Nodes marked as cloning results in a suffix tree can be extracted directly and directly stored in a cloning result data table. Thus, cloned code files can be detected in linear time and the method has a higher efficiency than a method characterized by performing detection directly according to fingerprint values and can achieve mass detection.
Owner:苏州棱镜七彩信息科技有限公司

Correctness verification method and system of suffix array and longest common prefix

ActiveCN107015952AImplement correctness verificationReduce time and space overheadNatural language data processingSpecial data processing applicationsArray data structureValidation methods
The invention relates to a correctness verification method and system of a suffix array and a longest common prefix. The method includes the steps that T is scanned once from right to left, the size of a character T[i] and the size of a subsequent character T[i+1] are compared according to the definition of suffix types, and the types of the character T[i] and the suffix suf(T, i) of T are calculated and recorded in t[i]; elements in SA1 and LCPA1 are initialized as -1; SA is scanned once from left to right, and all LMS suffixes and LCP values thereof in SA are found according to an array t and recorded in SA1 and LCPA1 in sequence respectively; the adjacent LMS suffixes and the LCP values thereof in SA1 are subjected to correctness verification according to the character string T, the array t, SA1 and LCPA1; L-type suffixes and LCP values thereof are inductively sorted according to the character string T, the array t, B, C, SA1 and LCPA1; S-type suffixes and LCP values thereof are inductively sorted according to the character string T, the array t, B, C, SA1 and LCPA1; SA, SA1, LCPA and LCPA1 are scanned once in sequence, whether SA and SA1 are identical and LCPA and LCPA1 are identical or not is determined through comparison, and if the two groups are identical through comparison, SA and LCPA of T are correct.
Owner:SYSU CMU SHUNDE INT JOINT RES INST +1

Online analytical processing (OLAP) query log mining and recommending method based on efficient mining of frequent closed sequences (BIDE)

InactiveCN102254034ASimplified representationNo reduction in recommendation accuracySpecial data processing applicationsPattern matchingFuzzy query
The invention relates to the OLAP recommending technology, in particular to an online analytical processing (OLAP) query log mining and recommending method based on efficient mining of frequent closed sequences (BIDE). In the method, the possible next query is recommended to OLAP users, so that the process of browsing and analyzing multi-dimensional data by the users is simplified. The method has the advantages that: based on the characteristic of query operation in the field of OLAP, fields for expressing OLAP operation in log files are extracted, and the log files are abstracted to form a query sequence, so that the expression method of the log files is simplified; a query pattern is mined in the query sequence by a BIDE algorithm, so that the efficiency of subsequent recommending is improved on the premise of ensuring that the recommending accuracy is not reduced; a suffix tree is established on the query mode, so that a starting point of query matching is not needed to be searched for by a search algorithm in subsequent pattern matching to improve the speed of the pattern matching; and a matching algorithm of a fuzzy query pattern is provided to improve the recommending accuracy.
Owner:ZHEJIANG HONGCHENG COMP SYST

Method for changing a target array, a method for analyzing a structure, and an apparatus, a storage medium and a transmission medium therefor

The objective of the present invention is the efficient analyzation of the structure of an array. By performing the prev(S) calculation for a character string S, if in the character string S a like variable is present upstream of a second variable, the second variable is changed to a numerical value that indicates the distance to the upstream like variable. But if in the character string S a like variable is not present upstream of a variable, that variable is changed to "0" to obtain a character string S 1. Further, by performing the compl(S) calculation for the a character string S, if in the character string S a complementary variable is present upstream of a second variable, the second variable is changed to a numerical value that indicates the distance to the complementary variable. But if in the character string S a complementary variable is not present upstream of a variable, that variable is changed to "0" to obtain a character string S2 (102). A single suffix tree (structure suffix tree) is generated by regarding the character strings S 1 and S2 as a pair of corresponding character strings (104 to 114), and the obtained structure suffix tree is employed to analyze the structure of the array that is represented by the character string S.
Owner:UNILOC 2017 LLC

User interest modeling method based on conceptual clustering

InactiveCN101571870AImprove accuracyExpress the content of the text preciselySpecial data processing applicationsInclusion relationData set
The invention discloses a new user interest modeling method based on conceptual clustering UIMC for solving the shortcomings in the aspects of accuracy and incremental processing capability of the traditional user interest modeling method. The method firstly constructs a suffix tree structure by analyzing a history document accessed by a user, then selects the different similarity thresholds and combines base clusters according to the different particle sizes. An interest level of the user is generated according to the inclusion relation in the base clusters merged according to the different threshold conditions. The UIMC method is the incremental and unsupervised conceptual studying method against the document, thereby being capable of easily obtaining and updating a user description file. Finally, the effectiveness of the UIMC method on the interest forecast aspect is verified by experiments over 20 News Group data set.
Owner:BEIHANG UNIV

Mining method for asynchronous periodic pattern in hydrologic time series

ActiveCN102495883AAvoid cycleAvoid wasting time and spaceSpecial data processing applicationsData miningSpacetime
The invention discloses a mining method for an asynchronous periodic pattern in a hydrologic time series. The mining method comprises the following steps of: firstly improving a partial periodic pattern mining algorithm based on a suffix tree so that the improved periodic pattern mining algorithm supports a multi-even series, thereby obtaining a candidate periodic pattern; and in an effective section generating process, putting forward an effective section generating algorithm which can be used for adjusting a candidate period in a self-adaptation manner, thereby avoiding period omission or space-time waste caused by a unified period. Compared with the prior art, the mining method provided by the invention can be used for more effectively finding the synchronous periodic pattern in the hydrologic time series.
Owner:HOHAI UNIV

String matching in hardware using the fm-index

String matching is a ubiquitous problem that arises in a wide range of applications in computer science, e.g., packet routing, intrusion detection, web querying, and genome analysis. Due to its importance, dozens of algorithms and several data structures have been developed over the years. A recent breakthrough in this field is the FM-index, a data structure that synergistically combines the Burrows-Wheeler transform and the suffix array. In software, the FM-index allows searching (exact and approximate) in times comparable to the fastest known indices for large texts (suffix trees and suffix arrays), but has the additional advantage to be much more space-efficient than those indices. This disclosure discusses an FPGA-based hardware implementation of the FM-index for exact and approximate pattern matching.
Owner:RGT UNIV OF CALIFORNIA

Method and system for searching and storing data

This invention relates to methods for storing and searching data. Embodiments of the invention make use of suffix trees to support binary pattern matching. Embodiments of the invention can be shown to have comparable search speeds to searches of known suffix trees, but are advantageous in that they have lower memory usage requirements which is important in large data environments.
Owner:BRITISH TELECOMM PLC +2

A searchable encryption system and method based on a suffix tree

The invention provides a searchable encryption system and method based on a suffix tree, and relates to the technical field of the Internet. The system comprises an initialization module used for constructing an encryption key and a suffix tree, a security index construction module used for constructing an index and encrypting the index, a sub-character string search module used for constructing asearch token and searching, and a verification decryption module used for decrypting and verifying. The method comprises the following steps of firstly, constructing a suffix tree and an encryption index for a given character string, and uploading the encryption index to a server; and when the client carries out character string search, generating and sending a search token to the server, enabling the server to search according to the search token and send a search result to the client to complete the search. According to the searchable encryption system and method based on the suffix tree, the efficient searching of any character string is achieved, the subcharacter string searching problem of ciphertext data is solved, and a user can inquire the ciphertext data without using keywords.
Owner:NORTHEASTERN UNIV

Behavior sequence anomaly detection method and system based on unsupervised algorithm

The invention provides a behavior sequence anomaly detection method based on an unsupervised algorithm. The method comprises the steps: calculating the time interval of two operations based on the operation data of an enterprise web system through the sequence of user operations, and segmenting a user behavior sequence according to whether the time interval of the two operations is greater than a preset threshold or not, and training a probability suffix tree model, outputting a probability value corresponding to the user behavior sequence according to the probability suffix tree model, taking the probability value corresponding to the user as a feature, i.e., input of an isolated forest model, and judging whether the user behavior is abnormal or not according to a model output result.
Owner:SHANGHAI GUAN AN INFORMATION TECH

Character string generation method, article of manufacture and system

A method, article of manufacture, and system for enabling context surrounding a search result to be displayed succinctly. The method includes searching a document set configured as a frequency ordered suffix tree to obtain a frequency ordered context tree. Applying dynamic programming to the frequency ordered context tree to retrieve a set (C) of context strings (c) having n1 elements of context strings (c). Defining an area covered by a character string (s) in the entire set of context strings C {c1, . . . , cn1} as the product of (1) the number (n2) of context strings (c) having s as a prefix and (2) the length of character string (s). Obtaining a set of character strings (S) that maximizes the sum of areas. In addition, dynamic programming can include a pruning process such that if an upper limit does not reach a maximum value, the search in progress is abandoned.
Owner:IBM CORP

Internet of Things information interoperation method based on sea-cloud computing architecture

The invention discloses an Internet of Things information interoperation method based on a sea-cloud computing architecture, belonging to the fields of big data processing and Internet of Things. A model is composed of a push-mode architecture based on sea-cloud computing and an unranked tree automata module, wherein the push-mode architecture based on sea-cloud computing includes cloud nodes andsea nodes, at the sea end of sea computing, real-time Internet of Things perception data flows are processed, and at the cloud end, key data is processed and stored, decision data is generated, and the like; and the unranked tree automata module adopts a suffix tree automata filter matching method, uses a tree automata technology, introduces a suffix thought, and adopts a bottom-up push-mode method of an unranked tree automata to process subscription requests. Thereby, the insufficiency of the integration of data and computing in an existing computing technology can be overcome, the cloud endand the sea end are organically integrated, the intermediate states of a large number of identical transfers in a data processing process can be effectively reduced, and the advantages of reasonable optimization of network resources, high file processing speed and strong semantic recognition can be achieved.
Owner:HARBIN ENG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products