Patents

Literature

Patsnap Eureka AI that helps you search prior art, draft patents, and assess FTO risks, powered by patent and scientific literature data.

66 results about "Suffix tree" patented technology

Filter

Efficacy Topic

Property

Owner

Technical Advancement

Application Domain

Technology Topic

Technology Field Word

Patent Country/Region

Patent Type

Patent Status

Application Year

Inventor

In computer science, a suffix tree (also called PAT tree or, in an earlier form, position tree) is a compressed trie containing all the suffixes of the given text as their keys and positions in the text as their values. Suffix trees allow particularly fast implementations of many important string operations. The construction of such a tree for the string S takes time and space linear in the length of S. Once constructed, several operations can be performed quickly, for instance locating a substring in S, locating a substring if a certain number of mistakes are allowed, locating matches for a regular expression pattern etc.

Index structure for supporting structural XML queries

InactiveUS20050114314A1Data processing applicationsSemi-structured data indexingPaper documentWildcard character

The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘ / / ’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).

Index structure for supporting structural XML queries

Index structure for supporting structural XML queries

Index structure for supporting structural XML queries

Owner:IBM CORP

Method for extracting name entities and jargon terms using a suffix tree data structure

InactiveUS7197449B2Natural language data processingSpecial data processing applicationsHigh probabilityAlgorithm

A method for entity name and jargon term recognition and extraction. An embodiment of the present invention uses a suffix tree data structure to determine frequently occurring phrases. In one embodiment text to be analyzed is preprocessed. The text is then separated into clauses and a suffix tree is created for the text. The suffix tree is used to determine repetitious segments. Unrecognized text fragment, occurring with a high frequency, have a comparably high probability of being a name entity or jargon term. The set of repetitious segments is then filtered to obtain a set of possible entity names and jargon terms.

Method for extracting name entities and jargon terms using a suffix tree data structure

Method for extracting name entities and jargon terms using a suffix tree data structure

Method for extracting name entities and jargon terms using a suffix tree data structure

Owner:INTEL CORP

Reputation prediction of IP addresses

ActiveUS8554907B1Improve efficiencyImprove the detection rateError preventionTransmission systemsGraphicsIp address

Daily query counts for e-mail messages sent from a number of IP addresses having unknown reputations are collected and logged, and optionally plotted. The logged query count data may optionally be normalized. The normalized query count data may also be plotted. The normalized data is divided into regions (numerically or graphically). Next, the divided regions are tagged (symbolically or graphically) with unique, symbolic identifiers such as letters, numbers, symbols or colors. Patterns for each unknown IP address are formed based upon the tagged regions. Common good and bad patterns are also identified for known good and bad IP addresses. The reputation of these unknown IP addresses are then predicted using these identified good and bad patterns using a suffix tree (for example). Finally, an output identifying the determined reputations of these unknown IP addresses is generated and output.

Reputation prediction of IP addresses

Reputation prediction of IP addresses

Reputation prediction of IP addresses

Owner:TREND MICRO INC

Table boundary detection in data blocks for compression

InactiveUS20130275399A1Well formedWeb data indexingDigital data processing detailsData streamBoundary detection

Data is converted into a minimized data representation using a suffix tree by sorting data streams according to symbolic representations for building table boundary formation patterns. The converted data is fully reversible for reconstruction while retaining minimal header information.

Table boundary detection in data blocks for compression

Table boundary detection in data blocks for compression

Table boundary detection in data blocks for compression

Owner:IBM CORP

Intelligent prompt method, module and system for search

ActiveCN103631929AQuick searchSave CPU timeSpecial data processing applicationsNatural language processingData mining

The invention discloses an intelligent prompt method, an intelligent prompt module and an intelligent prompt system for search. According to the method disclosed by the invention, a server executes the following steps of distinguishing prefix words and suffix words by a tokenizer; carrying out synonymy expansion to form a prefix synonym list and a suffix synonym list; then traversing a hot word suffix tree to search hot words of prefix matches and / or suffix matches to obtain candidate words; and analyzing and calculating probability of each candidate word by historical search behaviors of a user. According to the method, a client executes the following steps of calculating load relevance of each candidate word; and calculating a click-on predicted value of each candidate word and then selecting the candidate words to display according to the click-on predicted values. In the invention, prompt words are obtained by matching between the prefix words and the suffix words, synonyms are combined, mass of search intentions of the user are integrated and the local relevance is combined, so that the prompt words are more approximate to the search intentions of the user.

Intelligent prompt method, module and system for search

Intelligent prompt method, module and system for search

Owner:JIANGSU WISEDU INFORMATION TECH

Data compression system based on tree models

InactiveUS7265692B2Data processing applicationsCode conversionData compressionFinite-state machine

A method for encoding and decoding a sequence is provided. The method comprises searching a set of candidate trees varying in size for a tree T having a plurality of states. Tree T provides a structure that relatively minimizes code length of the sequence from among all the candidate trees. The method further comprises encoding data conditioned on the tree T, which may be a generalized context tree (GCT), using a sequential probability assignment conditioned on the states of the tree T. This encoding may use finite state machine (FSM) closure of the tree. Also provided are methods for decoding an encoded binary string when the encoded string includes a full tree or generalized context tree, as well as decoding an encoded string using incomplete FSM closure, incremental FSM, and suffix tree construction concepts.

Data compression system based on tree models

Data compression system based on tree models

Data compression system based on tree models

Owner:HEWLETT PACKARD DEV CO LP

Method and apparatus for indexing suffix tree in social network

InactiveUS20110179030A1Efficient clusteringEfficient constructionDigital data information retrievalDigital data processing detailsTheoretical computer scienceSocial web

A method for indexing a suffix tree in a social network includes: scanning an input string and dividing the string into partitions each having a common prefix; performing no-merge suffix tree indexing on the divided partitions; storing information on the partitions on which no-merge suffix tree indexing is performed; storing suffix nodes of the no-merge suffix tree; and establishing a prefix tree. The performing no-merge suffix tree indexing includes: generating a set of suffixes having the common prefix in the input string; generating a suffix set from the set of suffixes and storing the suffix set; and building the suffix set as a sub-tree.

Method and apparatus for indexing suffix tree in social network

Method and apparatus for indexing suffix tree in social network

Method and apparatus for indexing suffix tree in social network

Owner:ELECTRONICS & TELECOMM RES INST

Biological sequence local comparison method capable of obtaining complete solution

ActiveCN102750461AImprove efficiencyImprove query efficiencySpecial data processing applicationsComputer scienceBiological sequence alignment

Disclosed is a biological sequence local comparison method capable of obtaining a complete solution. The method includes adopting one biological sequence as a reference sequence and another biological sequence as a query sequence and setting a match score as Sa, a mismatch score as Sb, a gap opening penalty as Sg, a gap extension penalty as Ss and a fraction threshold as H; comparing suffix tree branches of the reference sequence with the query sequence; integrating comparison score results of each branch and taking a maximum score as a final comparison score result of the two biological sequences; and according to the final comparison score result, searching fragments provided with similar functions in the query sequence and the reference sequence or determining a homology relation between the query sequence and the reference sequence. According to the method, a Burrows-Wheeler transform (BWT) index is adopted, filtering and reuse technologies are combined to perform the comparison of the suffix tree branches of the reference sequence with the query sequence so as to obtain the complete solution for the comparison of the biological sequences, and the problems of insufficient accuracy and low efficiency in the prior art are solved.

Biological sequence local comparison method capable of obtaining complete solution

Biological sequence local comparison method capable of obtaining complete solution

Biological sequence local comparison method capable of obtaining complete solution

Owner:NORTHEASTERN UNIV

Index structure for supporting structural XML queries

InactiveUS7287023B2Data processing applicationsSemi-structured data indexingDocument preparationDocumentation

The present invention provides a ViST (or “virtual suffix tree”), which is a novel index structure for searching XML documents. By representing both XML documents and XML queries in structure-encoded sequences, it is shown that querying XML data is equivalent to finding (non-contiguous) subsequence matches. A variety of XML queries, including those with branches, or wild-cards (‘*’ and ‘ / / ’), can be expressed by structure-encoded sequences. Unlike index methods that disassemble a query into multiple sub-queries, and then join the results of these sub-queries to provide the final answers, ViST uses tree structures as the basic unit of query to avoid expensive join operations. Furthermore, ViST provides a unified index on both content and structure of the XML documents, hence it has a performance advantage over methods indexing either just content or structure. ViST supports dynamic index update, and it relies solely on B+Trees without using any specialized data structures that are not well supported by common database management systems (hereinafter referred to as “DBMSs”).

Index structure for supporting structural XML queries

Index structure for supporting structural XML queries

Index structure for supporting structural XML queries

Owner:IBM CORP

Suffix tree based catalog organizing method in distributed file system

ActiveCN102024019ASpecial data processing applicationsDistributed File SystemData mining

The invention provides a suffix tree based catalog organizing method in a distributed file system. The method comprises the following steps of: grouping catalog items according to names, and storing different groups of catalog items on different discs on a storage server; and organizing and storing different groups of catalog items by adopting a suffix tree method.

Suffix tree based catalog organizing method in distributed file system

Suffix tree based catalog organizing method in distributed file system

Suffix tree based catalog organizing method in distributed file system

Owner:DAWNING INFORMATION IND BEIJING +1

Out-of-order data packet string matching method and system

InactiveCN104796354AAchieve matchingData switching networksSpecial data processing applicationsData streamNetwork packet

The invention relates to an out-of-order data packet string matching method and system. The out-of-order data packet string matching method comprises the following steps of initializing and determining a finite state automata DFA and a mode suffix tree PST; initializing a buffering area and receiving character strings transmitted in network and obtained through data flows one by one, wherein every data flow is formed by at least two character strings orderly; obtaining character strings belonging to the same data flow one by one; setting and determining a current state of the finite state automata if the current character string has a prefix; adding a finding state to the tail of the current character string and obtaining a combined fragment if the current character string has the suffix; inputting the combined fragment to the finite state automata; storing the current character string information and enabling the current character string to pass. According to the out-of-order data packet string matching method, the model does not need caching of the data package but only caches states and accordingly matching of the character string with out-of-order data package is achieved.

Out-of-order data packet string matching method and system

Out-of-order data packet string matching method and system

Out-of-order data packet string matching method and system

Owner:INST OF INFORMATION ENG CAS

User abnormal behavior detection method and system

ActiveCN109889538ARecognition is accurate in advanceAccurate identificationCharacter and pattern recognitionTransmissionAlgorithmTransition probability matrix

The invention provides a user abnormal behavior detection method and system, and the method comprises the steps: obtaining historical transaction data, carrying out the training through a Markov modeland a probability suffix tree model according to the historical transaction data, and obtaining a Markov transition probability matrix and a probability suffix tree transition probability matrix; combining the Markov transition probability matrix and the probability suffix tree transition probability matrix through a linear weighted fusion method to obtain a fraud early warning transition probability matrix, and obtaining a fraud transaction early warning model according to the fraud early warning transition probability matrix; identifying the historical transaction data through a preset critical value and the fraud transaction early warning model, and adjusting the preset critical value according to an identification result to obtain a final critical value; and identifying to-be-detectedtransaction data according to the final critical value and the fraud transaction early warning model to obtain a detection result.

User abnormal behavior detection method and system

User abnormal behavior detection method and system

User abnormal behavior detection method and system

Owner:INDUSTRIAL AND COMMERCIAL BANK OF CHINA

System and method of discovering, detecting and classifying alarm patterns for electrophysiological monitoring systems

ActiveUS20080284582A1Efficient detectionOvercomes drawbackMedical automated diagnosisCharacter and pattern recognitionMonitoring systemEngineering

A system and method for electrophysiological monitoring system including a plurality of sensors configured to detect one or more health parameters of a patient and a monitoring device configured to receive a plurality of sensing signals from the sensors and output a monitoring signal representative of an alarm sequence, wherein the alarm sequence comprises a set of alarm events identified in the sensing signals. The system also includes an on-line monitoring module configured to generate a suffix tree data structure in response to the monitoring signal to identify alarm patterns from the set of alarm events and classify the alarm sequence in response to the occurrences of alarm patterns in the alarm sequence. The on-line monitoring module is further configured to alert monitoring personnel of an alarm condition after processing the alarm sequence in real-time.

System and method of discovering, detecting and classifying alarm patterns for electrophysiological monitoring systems

System and method of discovering, detecting and classifying alarm patterns for electrophysiological monitoring systems

System and method of discovering, detecting and classifying alarm patterns for electrophysiological monitoring systems

Owner:GENERAL ELECTRIC CO

Language model training method, query method and corresponding device

ActiveCN103871404AFast trainingQuick updateSpeech recognitionNODALTrie

The invention provides a language model training method, a query method and a corresponding device; the training method comprises the following steps: partitioning training corpus to obtain N groups of training corpus, wherein the N is a positive integer bigger than 1; carrying out parallel execution to the N groups of training corpus obtained by partition; ordering recursion suffix trees so as to respectively obtain ordering results reflecting inverted order position conditions of each word in each sentence; based on the ordering result, respectively setting up an n-ary word order tree according to a preset first word order structure under a condition that a second last word of each sentence is regarded as a root node, and the n refers to the preset one or more positive integers bigger than 1; combining the word order trees of the same root node and converting the word order so as to obtain a Trie tree storing forward probability information. A word order sequence of the Trie tree from root to leaf is as the following order: the second last word in the sentence, a last work, and other words arranged in an inverted order. By employing the method and device, the language model can be fast updated.

Language model training method, query method and corresponding device

Language model training method, query method and corresponding device

Language model training method, query method and corresponding device

Owner:BEIJING BAIDU NETCOM SCI & TECH CO LTD

Character string matching method based on automatic control (AC) automatic machine and suffix tree

InactiveCN103023883ASave resourcesGet rid of the disadvantage of reduced localityTransmissionSpecial data processing applicationsAutomatic train controlAutomatic control

The invention discloses a character string matching method based on an automatic control (AC) automatic machine and a suffix tree, which comprises the following steps of: S1, compiling a characteristic character string into an AC automatic machine; S2, gathering suffixes of the characteristic character string and compiling into a suffix tree; S3, as long as a data packet enters into network security equipment, matching the data packet depending on the AC automatic machine, and conserving a matching state through the suffix tree; and S4, if the matching is successful, discarding the data packet. According to the character string matching method disclosed by the invention, the state numbers of the AC automatic machine and the suffix tree are conserved while matching the character string of the data packet, so that the data packet can be matched in a manner of continuing the last state even though disorder occurs, to avoid cache of the previous data packet; the shortcomings of increment of delay, deterioration of memory consumption and local reduction of a high-speed cache memory due to the cache are overcome, resource required by the network security equipment is reduced and performance of the network security equipment is improved.

Character string matching method based on automatic control (AC) automatic machine and suffix tree

Character string matching method based on automatic control (AC) automatic machine and suffix tree

Character string matching method based on automatic control (AC) automatic machine and suffix tree

Owner:TSINGHUA UNIV

A method for detecting outliers in time series

InactiveCN109542952AComputationally efficientAccurately revealSpecial data processing applicationsDatabase indexingData setAlgorithm

The technical proposal of the invention discloses a method for detecting abnormal points in time series, comprising the following steps: S1, discretizing an original time series and obtaining a symbolstring; 2, marking that data in the symbol string to form a symbolic train data set; S3, constructing a probability suffix tree according to the symbolized training data set; S4: detecting an abnormal point in the data sequence to be detected according to the probability suffix tree. A method for detecting time series anomaly point in that technical proposal of the invention can find out the anomaly mode which deviates from the conventional mode, can reveal the hidden information of the data more accurately and solve many practical problems. Time series can be expressed as probability suffixtree after being converted into symbol string by discretization processing. the emethod is more concise and more efficient to calculate the probability of suffix symbol of different symbol string, andthe recall ratio is high, and the detection effect is good.

A method for detecting outliers in time series

A method for detecting outliers in time series

A method for detecting outliers in time series

Owner:中国民用航空上海航空器适航审定中心

Search algorithm for Chinese word segmentation

ActiveCN108846016AImprove search efficiencyLess build timeNatural language data processingSpecial data processing applicationsTheoretical computer scienceChinese word

The invention belongs to the technical field of text search engines and specifically relates to a search algorithm for Chinese word segmentation. The algorithm is mainly divided into two phases including an offline indexing phase and an online searching phase. In the offline indexing phase, firstly suffix string sets of all original string sets are extracted, and then an improved suffix tree is generated by the suffix string sets. In the online searching phase, firstly query results of a keyword are obtained according to an index model based on the suffix tree, then a matching degree between the keyword and the query result is quantified, and finally, the query results are sorted from high to low according to a matching program followed by return. According to the search algorithm, an index construction time and an occupation space are balanced through an improved index structure based on the suffix tree, thus the search efficiency of the index structure with the search algorithm is much higher than the efficiency of violently calculating the matching degree and sorting efficiency of a result set.

Search algorithm for Chinese word segmentation

Search algorithm for Chinese word segmentation

Search algorithm for Chinese word segmentation

Owner:FUDAN UNIV

Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)

InactiveCN107844731AEfficient detection methodAnswering questions about unusual data pointsCharacter and pattern recognitionData setAlgorithm

The invention belongs to the field of anomaly detection of time series data, and relates to a long-symbol-string anomaly-point detection method based on a probabilistic suffix tree (PST). According tothe method, discretization technology of continuous data and a probabilistic suffix tree model are utilized to detect long-time-series anomaly data points, and the steps thereof include: discretizingthe originally continuous long time series data to obtain a long symbol string, constructing the probabilistic suffix tree according to a symbolized training data set, utilizing the constructed PST to detect the delta-anomaly-points in a to-be-detected data set, and utilizing F<1>-Measure to evaluate a detection effect. Experimental results show that the method can effectively support various long time series, is higher in all of a recall rate, an accuracy rate and a precision rate, is good in the detection effect, and can be applied to various fields of aerospace, medical data analysis, financial data analysis, network anomaly behavior detection and the like.

Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)

Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)

Long-time-series delta-anomaly-point detection method based on probabilistic suffix tree (PST)

Owner:FUDAN UNIV

A suffix tree-based code file cloning detection method

ActiveCN106990956AImprove efficiencyRealize massive detectionSoftware engineeringSpecific program execution arrangementsBase codeSource code file

The invention relates to a suffix tree-based code file cloning detection method which can build suffix trees for engineering project files and achieve code file cloning detection in linear time. An LP detection scheme and algorithm is characterized in that content of source code files of computer software is used as granularity, and by performing lexical analysis and filtering on the code files and obtaining fingerprint values through MD5 hash, fingerprints are created and a fingerprint database is built. The fingerprint database is stored in a MySQL database, and the id of an open source project where the fingerprints are located is used as an index. Nodes marked as cloning results in a suffix tree can be extracted directly and directly stored in a cloning result data table. Thus, cloned code files can be detected in linear time and the method has a higher efficiency than a method characterized by performing detection directly according to fingerprint values and can achieve mass detection.

A suffix tree-based code file cloning detection method

A suffix tree-based code file cloning detection method

A suffix tree-based code file cloning detection method

Owner:苏州棱镜七彩信息科技有限公司

Correctness verification method and system of suffix array and longest common prefix

ActiveCN107015952AImplement correctness verificationReduce time and space overheadNatural language data processingSpecial data processing applicationsArray data structureValidation methods

The invention relates to a correctness verification method and system of a suffix array and a longest common prefix. The method includes the steps that T is scanned once from right to left, the size of a character T[i] and the size of a subsequent character T[i+1] are compared according to the definition of suffix types, and the types of the character T[i] and the suffix suf(T, i) of T are calculated and recorded in t[i]; elements in SA1 and LCPA1 are initialized as -1; SA is scanned once from left to right, and all LMS suffixes and LCP values thereof in SA are found according to an array t and recorded in SA1 and LCPA1 in sequence respectively; the adjacent LMS suffixes and the LCP values thereof in SA1 are subjected to correctness verification according to the character string T, the array t, SA1 and LCPA1; L-type suffixes and LCP values thereof are inductively sorted according to the character string T, the array t, B, C, SA1 and LCPA1; S-type suffixes and LCP values thereof are inductively sorted according to the character string T, the array t, B, C, SA1 and LCPA1; SA, SA1, LCPA and LCPA1 are scanned once in sequence, whether SA and SA1 are identical and LCPA and LCPA1 are identical or not is determined through comparison, and if the two groups are identical through comparison, SA and LCPA of T are correct.

Correctness verification method and system of suffix array and longest common prefix

Correctness verification method and system of suffix array and longest common prefix

Owner:SYSU CMU SHUNDE INT JOINT RES INST +1

Online analytical processing (OLAP) query log mining and recommending method based on efficient mining of frequent closed sequences (BIDE)

InactiveCN102254034ASimplified representationNo reduction in recommendation accuracySpecial data processing applicationsPattern matchingFuzzy query

The invention relates to the OLAP recommending technology, in particular to an online analytical processing (OLAP) query log mining and recommending method based on efficient mining of frequent closed sequences (BIDE). In the method, the possible next query is recommended to OLAP users, so that the process of browsing and analyzing multi-dimensional data by the users is simplified. The method has the advantages that: based on the characteristic of query operation in the field of OLAP, fields for expressing OLAP operation in log files are extracted, and the log files are abstracted to form a query sequence, so that the expression method of the log files is simplified; a query pattern is mined in the query sequence by a BIDE algorithm, so that the efficiency of subsequent recommending is improved on the premise of ensuring that the recommending accuracy is not reduced; a suffix tree is established on the query mode, so that a starting point of query matching is not needed to be searched for by a search algorithm in subsequent pattern matching to improve the speed of the pattern matching; and a matching algorithm of a fuzzy query pattern is provided to improve the recommending accuracy.

Online analytical processing (OLAP) query log mining and recommending method based on efficient mining of frequent closed sequences (BIDE)

Online analytical processing (OLAP) query log mining and recommending method based on efficient mining of frequent closed sequences (BIDE)

Online analytical processing (OLAP) query log mining and recommending method based on efficient mining of frequent closed sequences (BIDE)

Owner:ZHEJIANG HONGCHENG COMP SYST

Method for changing a target array, a method for analyzing a structure, and an apparatus, a storage medium and a transmission medium therefor

InactiveUS20020102545A1Microbiological testing/measurementBiological testingTarget arrayTheoretical computer science

The objective of the present invention is the efficient analyzation of the structure of an array. By performing the prev(S) calculation for a character string S, if in the character string S a like variable is present upstream of a second variable, the second variable is changed to a numerical value that indicates the distance to the upstream like variable. But if in the character string S a like variable is not present upstream of a variable, that variable is changed to "0" to obtain a character string S 1. Further, by performing the compl(S) calculation for the a character string S, if in the character string S a complementary variable is present upstream of a second variable, the second variable is changed to a numerical value that indicates the distance to the complementary variable. But if in the character string S a complementary variable is not present upstream of a variable, that variable is changed to "0" to obtain a character string S2 (102). A single suffix tree (structure suffix tree) is generated by regarding the character strings S 1 and S2 as a pair of corresponding character strings (104 to 114), and the obtained structure suffix tree is employed to analyze the structure of the array that is represented by the character string S.

Method for changing a target array, a method for analyzing a structure, and an apparatus, a storage medium and a transmission medium therefor

Method for changing a target array, a method for analyzing a structure, and an apparatus, a storage medium and a transmission medium therefor

Method for changing a target array, a method for analyzing a structure, and an apparatus, a storage medium and a transmission medium therefor

Owner:UNILOC 2017 LLC

User interest modeling method based on conceptual clustering

InactiveCN101571870AImprove accuracyExpress the content of the text preciselySpecial data processing applicationsInclusion relationData set

The invention discloses a new user interest modeling method based on conceptual clustering UIMC for solving the shortcomings in the aspects of accuracy and incremental processing capability of the traditional user interest modeling method. The method firstly constructs a suffix tree structure by analyzing a history document accessed by a user, then selects the different similarity thresholds and combines base clusters according to the different particle sizes. An interest level of the user is generated according to the inclusion relation in the base clusters merged according to the different threshold conditions. The UIMC method is the incremental and unsupervised conceptual studying method against the document, thereby being capable of easily obtaining and updating a user description file. Finally, the effectiveness of the UIMC method on the interest forecast aspect is verified by experiments over 20 News Group data set.

User interest modeling method based on conceptual clustering

User interest modeling method based on conceptual clustering

User interest modeling method based on conceptual clustering

Owner:BEIHANG UNIV

Mining method for asynchronous periodic pattern in hydrologic time series

ActiveCN102495883AAvoid cycleAvoid wasting time and spaceSpecial data processing applicationsData miningSpacetime

The invention discloses a mining method for an asynchronous periodic pattern in a hydrologic time series. The mining method comprises the following steps of: firstly improving a partial periodic pattern mining algorithm based on a suffix tree so that the improved periodic pattern mining algorithm supports a multi-even series, thereby obtaining a candidate periodic pattern; and in an effective section generating process, putting forward an effective section generating algorithm which can be used for adjusting a candidate period in a self-adaptation manner, thereby avoiding period omission or space-time waste caused by a unified period. Compared with the prior art, the mining method provided by the invention can be used for more effectively finding the synchronous periodic pattern in the hydrologic time series.

Mining method for asynchronous periodic pattern in hydrologic time series

Mining method for asynchronous periodic pattern in hydrologic time series

Mining method for asynchronous periodic pattern in hydrologic time series

Owner:HOHAI UNIV

String matching in hardware using the fm-index

InactiveUS20120233185A1Digital data information retrievalDigital data processing detailsData packAlgorithm

String matching is a ubiquitous problem that arises in a wide range of applications in computer science, e.g., packet routing, intrusion detection, web querying, and genome analysis. Due to its importance, dozens of algorithms and several data structures have been developed over the years. A recent breakthrough in this field is the FM-index, a data structure that synergistically combines the Burrows-Wheeler transform and the suffix array. In software, the FM-index allows searching (exact and approximate) in times comparable to the fastest known indices for large texts (suffix trees and suffix arrays), but has the additional advantage to be much more space-efficient than those indices. This disclosure discusses an FPGA-based hardware implementation of the FM-index for exact and approximate pattern matching.

String matching in hardware using the fm-index

String matching in hardware using the fm-index

String matching in hardware using the fm-index

Owner:RGT UNIV OF CALIFORNIA

Method and system for searching and storing data

ActiveUS20150006577A1Digital data processing detailsSpecial data processing applicationsTheoretical computer scienceSuffix tree

This invention relates to methods for storing and searching data. Embodiments of the invention make use of suffix trees to support binary pattern matching. Embodiments of the invention can be shown to have comparable search speeds to searches of known suffix trees, but are advantageous in that they have lower memory usage requirements which is important in large data environments.

Method and system for searching and storing data

Method and system for searching and storing data

Method and system for searching and storing data

Owner:BRITISH TELECOMM PLC +2

A searchable encryption system and method based on a suffix tree

PendingCN109815723ASupport search functionEnsure safetyDigital data protectionOther databases indexingSearch problemCiphertext

The invention provides a searchable encryption system and method based on a suffix tree, and relates to the technical field of the Internet. The system comprises an initialization module used for constructing an encryption key and a suffix tree, a security index construction module used for constructing an index and encrypting the index, a sub-character string search module used for constructing asearch token and searching, and a verification decryption module used for decrypting and verifying. The method comprises the following steps of firstly, constructing a suffix tree and an encryption index for a given character string, and uploading the encryption index to a server; and when the client carries out character string search, generating and sending a search token to the server, enabling the server to search according to the search token and send a search result to the client to complete the search. According to the searchable encryption system and method based on the suffix tree, the efficient searching of any character string is achieved, the subcharacter string searching problem of ciphertext data is solved, and a user can inquire the ciphertext data without using keywords.

A searchable encryption system and method based on a suffix tree

A searchable encryption system and method based on a suffix tree

A searchable encryption system and method based on a suffix tree

Owner:NORTHEASTERN UNIV

Behavior sequence anomaly detection method and system based on unsupervised algorithm

ActiveCN112738088AAnomaly detection worksAdaptableCharacter and pattern recognitionNeural architecturesAlgorithmAnomaly detection

The invention provides a behavior sequence anomaly detection method based on an unsupervised algorithm. The method comprises the steps: calculating the time interval of two operations based on the operation data of an enterprise web system through the sequence of user operations, and segmenting a user behavior sequence according to whether the time interval of the two operations is greater than a preset threshold or not, and training a probability suffix tree model, outputting a probability value corresponding to the user behavior sequence according to the probability suffix tree model, taking the probability value corresponding to the user as a feature, i.e., input of an isolated forest model, and judging whether the user behavior is abnormal or not according to a model output result.

Behavior sequence anomaly detection method and system based on unsupervised algorithm

Behavior sequence anomaly detection method and system based on unsupervised algorithm

Behavior sequence anomaly detection method and system based on unsupervised algorithm

Owner:SHANGHAI GUAN AN INFORMATION TECH

Character string generation method, article of manufacture and system

InactiveUS20120036149A1Substantial pruning of the searchProcessing speedDigital data processing detailsNatural language data processingTheoretical computer scienceDynamic programming

A method, article of manufacture, and system for enabling context surrounding a search result to be displayed succinctly. The method includes searching a document set configured as a frequency ordered suffix tree to obtain a frequency ordered context tree. Applying dynamic programming to the frequency ordered context tree to retrieve a set (C) of context strings (c) having n1 elements of context strings (c). Defining an area covered by a character string (s) in the entire set of context strings C {c1, . . . , cn1} as the product of (1) the number (n2) of context strings (c) having s as a prefix and (2) the length of character string (s). Obtaining a set of character strings (S) that maximizes the sum of areas. In addition, dynamic programming can include a pruning process such that if an upper limit does not reach a maximum value, the search in progress is abandoned.

Character string generation method, article of manufacture and system

Character string generation method, article of manufacture and system

Character string generation method, article of manufacture and system

Owner:IBM CORP

Internet of Things information interoperation method based on sea-cloud computing architecture

InactiveCN108471355ARealize a reasonable distributionAvoid double countingData switching networksIntermediate stateAutomaton

The invention discloses an Internet of Things information interoperation method based on a sea-cloud computing architecture, belonging to the fields of big data processing and Internet of Things. A model is composed of a push-mode architecture based on sea-cloud computing and an unranked tree automata module, wherein the push-mode architecture based on sea-cloud computing includes cloud nodes andsea nodes, at the sea end of sea computing, real-time Internet of Things perception data flows are processed, and at the cloud end, key data is processed and stored, decision data is generated, and the like; and the unranked tree automata module adopts a suffix tree automata filter matching method, uses a tree automata technology, introduces a suffix thought, and adopts a bottom-up push-mode method of an unranked tree automata to process subscription requests. Thereby, the insufficiency of the integration of data and computing in an existing computing technology can be overcome, the cloud endand the sea end are organically integrated, the intermediate states of a large number of identical transfers in a data processing process can be effectively reduced, and the advantages of reasonable optimization of network resources, high file processing speed and strong semantic recognition can be achieved.

Internet of Things information interoperation method based on sea-cloud computing architecture

Internet of Things information interoperation method based on sea-cloud computing architecture

Internet of Things information interoperation method based on sea-cloud computing architecture

Owner:HARBIN ENG UNIV

Popular searches

Management system Database Index method Tree structure Xml data Sequence matching Named entity Phrase High frequency High Frequency Waves