Patents
Literature
Hiro is an intelligent assistant for R&D personnel, combined with Patent DNA, to facilitate innovative research.
Hiro

44 results about "Sequence alignment algorithm" patented technology

Industrial control private agreement-based fuzzy test method

The invention discloses an industrial control private agreement-based fuzzy test method. A protocol tree for a private agreement is constructed through private agreement data flow captured in a normal industrial control network environment and a private agreement tree construction algorithm, a request message and a response message are effectively classified. Basic protocol information is learned, and protocol characteristics are learned through counting data sequences of individual classes and using probability statistics, a length domain recognition algorithm, an Apriori association rule algorithm and a Needleman / Wunsch pairwise sequence alignment algorithm. Different protocol characteristics are varied by using a variation rule to generate test cases. The connection condition with a tested device is monitored in the test process and the response data condition of the tested device is detected by using request and response characteristics. According to the method, the problem of efficiency of fuzzy test of the industrial control private agreement can be solved, and the effectiveness of the test cases is improved. The method comprises a data preprocessing module, a protocol learning module, a fuzzy test module and an exception alarm module.
Owner:BEIJING UNIV OF TECH

Determination of optimal local sequence alignment similarity score

InactiveUS20040024536A1Microbiological testing/measurementRecombinant DNA-technologyLocal sequence alignmentProtein function prediction
Sequence alignment and sequence database similarity searching are among the most important and challenging task in bio informatics, and are used for several purposes, including protein function prediction. An efficient parallelisation of the Smith-Waterman sequence alignment algorithm using parallel processing in the form of SIMD (Single-Instruction, Multiple-Data) technology is presented. The method has been implementation using the MMX (MultiMedia eXtensions) and SSE (Streaming SIMD Extensions) technology that is embedded in Intel's latest microprocessors, but the method can also be implemented using similar technology existing in other modern microprocessors. Near eight-fold speed-up relative to the fastest previously an optimised eight-way parallel processing approach achieved know non-parallel Smith-Waterman implementation on the same hardware. A speed of about 200 million cell updates per second has been obtained on a single Intel Pentium III 500 MHz microprocessor.
Owner:SEEBERG ERLING CHRISTEN +1

Method of application classification in Tor anonymous communication flow

ActiveCN104135385AReduce loadImplement application classificationData switching networksTraffic capacitySequence alignment algorithm
The invention discloses a method of application classification in Tor anonymous communication flow, which mainly solves the problem of acquisition of upper-layer application type information in the Tor anonymous communication flow and relates to the correlation technique, such as feature selection, sampling preprocessing and flow modeling. The method comprises the following steps of: firstly, defining a concept of a flow burst section by utilizing a data packet scheduling mechanism of Tor, and serving a volume value and a direction of the flow burst section as classification features; secondly, preprocessing a data sample based on a K-means clustering algorithm and a multiple sequence alignment algorithm, and solving the problems of over-fitting and inconsistent length of the data sample through the manners of value symbolization and gap insertion; and lastly, respectively modeling uplink Tor anonymous communication flow and downlink Tor anonymous communication flow of different applications by utilizing a Profile hidden Markov model, providing a heuristic algorithm to establish the Profile hidden Markov model quickly, during specific classification, substituting features of network flow to be classified into the Profile hidden Markov models of different applications, respectively figuring up probabilities corresponding to an uplink flow model and a downlink flow model, and deciding the upper-layer application type included by the Tor anonymous communication flow to be classified through a maximum joint probability value.
Owner:南京市公安局

Ransomware variation detection method based on sequence alignment algorithm

The invention provides a ransomware variation detection method based on a sequence alignment algorithm. The method comprises the specific steps of inputting a ransomware sample, extracting a sample feature sequence, processing the sample feature sequence into a gene sequence, and detecting a ransomware variation. The step of variation detection specifically comprises the sub-steps of clustering each gene sequence in a sample set, extracting clustering result information to acquire various ransomware families; using the sequence alignment algorithm Needleman-Wunsch to compute similarity betweena sample to be detected and a class cluster center sample of various ransomware families, screening out clusters with the similarity more than a preset threshold, and using the screened clusters to form a new ransomware training sample set; determining the ransomware family class to which the sample to be detected belongs b using the newly screened training sample set in combination with the sequence alignment algorithm and a KNN classification algorithm to achieve variation detection. According to the method, the purpose of quickly achieving ransomware variation detection is achieved by combining the sequence alignment algorithm with the existing classification algorithm.
Owner:BEIJING INSTITUTE OF TECHNOLOGYGY +1

Test-case selection method based on user sessions and hierarchical clustering algorithm

The invention discloses a test-case selection method based on user sessions and a hierarchical clustering algorithm. The method includes the following steps: acquiring server access logs, and carryingout sorting according to time; carrying out preprocessing and clustering to form a user session sequence set; calculating similarity distances among all user session sequences through using an improved user-session-sequence comparison algorithm; employing the improved condensing hierarchical clustering algorithm to cluster the user session sequences, and outputting final clustering results of test cases; and optimizing selection of the test cases through deleting redundant test cases. According to the method of the invention, representative user operation sequences can be quickly mined from the large number of server access logs to use the same as test cases, automation of test-case generation and optimization of test-case selection are realized, and subsequent work of automated functiontests of a server, performance tests, user behavior analysis and the like is facilitated.
Owner:SOUTH CHINA UNIV OF TECH

Genomic sequence alignment method and genomic sequence alignment device

The invention discloses a genomic sequence alignment method and a genomic sequence alignment device. The method includes: reading part of genomic sequences from to-be-aligned genomic sequence files; subjecting the part of the genomic sequences and a reference genomic sequence to alignment according to a two-way BWT alignment algorithm, a single-end dynamic programming alignment algorithm and a double-end dynamic programming alignment algorithm; after alignment is finished according to any of the alignment algorithms, if no sequence failed in alignment exists in the part of the genomic sequences, reading new part of genomic sequences from the to-be-aligned genomic sequence files, and performing alignment according to the steps; repeating the steps until alignment of all of the to-be-aligned genomic sequence files is finished, and outputting alignment results. By the genomic sequence alignment method and the genomic sequence alignment device, problems of high time consumption, low processing speed and high resource consumption of the genomic sequence alignment algorithms can be solved.
Owner:UNITED ELECTRONICS

Social network association searching method based on graphics processing unit (GPU) multiple sequence alignment algorithm

The invention discloses a social network association searching method based on a graphics processing unit (GPU) multiple sequence alignment algorithm. The method comprises the following steps that: a central processing unit (CPU) performs web crawler on an individual webpage so as to extract an individual characteristic vector from a social network; the CPU filters redundant characteristic information from the individual characteristic vector so as to generate a uniform individual characteristic information vector base; a GPU calculates an individual distance matrix and a correction distance matrix of the social network according to the uniform individual characteristic information vector base; the GPU establishes a social network association route guidance tree according to the correction distance matrix; and the GPU traverses the social network association route guidance tree so as to perform the optimal association route searching. By utilizing the advantage that the GPU is suitable for processing a large amount of dense data, associated searching problems which are solved by the the multiple sequence alignment algorithm are parallelized, complex and time-consuming operations, such as formation and traversing of the matrixes and the association route guidance tree, are finished by the GPU, and the problem of long time caused by a large amount of social network data and operation complexity is solved.
Owner:HUAZHONG UNIV OF SCI & TECH

Unknown protocol message format deduction method

The present invention provides an unknown protocol message format deduction method. The method comprises the steps of capturing an original data packet in the network, establishing a sequence alignment binary tree according to the length of the data packet, and carrying out the upward sequence alignment from the leaf nodes of the binary tree, wherein the sequence alignment adopts a sequence alignment algorithm based on dynamic programming, obtaining a result possessing the same length leaf node alignment after the sequence alignment of all nodes are ended, and according to the result, searching the same parts, thereby automatically realizing the unknown protocol message format deduction and output. Compared with an existing artificial participation unknown data packet format deduction method, an automatic unknown protocol message method based on the data packet sequence alignment provided by the present invention enables the artificial participation workload to be reduced to realize the automatic deduction on the basis of determining the number of the acquisition data packets, and can realize the effective deduction to an unknown protocol data packet format on the condition of not having data packet format any prior information.
Owner:SOUTHWEST CHINA RES INST OF ELECTRONICS EQUIP

Large-scale ontology mapping method for Chinese languages

The invention provides a mapping method for large-scale Chinese ontology. The method comprises the following steps: initializing a correlation degree computing method on the basis of the concept integrating Chinese thesaurus and an edit distance similarity algorithm; compressing large-scale ontology mapping scale on the basis of a pseudo-nuclear-force field potential function integrating concept similarity and dissimilarity improved by initial correlation degree; performing similarity measurement on complex concepts in the Chinese ontology through introducing a global sequence alignment algorithm. Chinese works have the phenomena of polysemy and sensitive word order, and the computing cost of large-scale ontology mapping is high, and according to the method, firstly, the existing pseudo-nuclear-force field potential function is improved, so that the measurement of similarity among concepts and the scale compression of the ontology to be mapped are more reasonable. Secondly, a global sequence alignment technology is adopted to map complex Chinese concepts, further defects of a traditional Chinese ontology mapping system are overcome, and finally the mapping efficiency of the system is improved, and the precision ratio and the recall ratio are increased.
Owner:CAPITAL UNIV OF ECONOMICS & BUSINESS

Prediction method for signal peptide and cleavage site thereof on the basis of layered mixture model

The invention discloses a prediction method for signal peptide and a cleavage site thereof on the basis of a layered mixture model. The prediction method comprises the following steps that: firstly, in a first layer, applying an SVM (Support Vector Machine) classifier based on amino acid residue features to identify whether a protein sequence contains N-end hydrophobic fragments or not; then, in a second layer, applying a Naive Bayes and SVM classifier based on amino acid residue features and functional structural domain features to identify whether the hydrophobic fragments are the signal peptide or N-end transmembrane helixes or not; and finally, in a third layer, according to a statistical learning rule, screening candidate cleavage sites, calculating a statistical credit score, then, calculating the similarity score of a signal peptide sequence through a Needleman-Wunsch sequence comparison algorithm, and determining a predicted signal peptide cleavage site for the statistical credit score and a sequence similarity score integral.
Owner:SHANGHAI JIAO TONG UNIV

Multiple sequence alignment visualization method based on image processing

The invention relates to a multiple sequence alignment visualization method based on image processing. The method includes following steps: S1, taking multiple amino acid sequences generated by a multiple sequence alignment algorithm as input; S2, respectively defining different colors for different types of amino acids, and performing color conversion on the amino acid sequences; S3, combining with image conversion to enable each amino acid in the amino acid sequences to correspond to one pixel in images, to enable color of each pixel to correspond to that of the corresponding amino acid andto convert multiple one-dimensional amino acid sequences into two-dimensional colored images; S4, utilizing an image segmentation method based on edge detection to segment converted images, and presenting segmented images to a user.
Owner:SUN YAT SEN UNIV

Method and system for sensing abnormal signs in daily activities

There are provided a method and system for sensing abnormal signs in daily activities, the method comprising, at the system, sensing the daily activities, reading previously stored daily activity information, generating a daily activity sequence based thereon, sensing the abnormal signs from the daily activity sequence by using a preset sequence alignment algorithm, and providing the sensed abnormal signs to a user. As described above, the abnormal signs, which should be checked to provide care services, are sensed via changes in a daily activity pattern and added to a care service system that will be installed in welfare facilities for the aged or a home of a solitary old person, thereby effectively sensing the abnormal signs in daily activities of the aged.
Owner:ELECTRONICS & TELECOMM RES INST

Gene sequence alignment method and system

The invention discloses a gene sequence alignment method and system. The method comprises the following steps: storing a reference genome sequence and a query genome sequence in a distributed storage system; under a Spark heterogeneous distributed computing platform framework, segmenting a reference genome sequence according to row offset, and preprocessing to obtain a plurality of preprocessed reference data sets; establishing an index for each preprocessing reference data set by adopting a suffix array algorithm, and combining all the preprocessing reference data sets after the index is established to obtain a reference sequence index file; carrying out CUDA fine-grained sequence comparison on each fragment in the query genome sequence and a reference sequence index file by adopting a seed extension algorithm, and determining position information of each fragment in the reference sequence index file; and combining the position information of all the fragments in the reference sequence index file to obtain a gene sequence comparison result. According to the invention, the calculation speed and precision of a large-scale sequence alignment algorithm are improved.
Owner:HUAZHONG AGRI UNIV

Gesture identity authentication system and method based on sensor on mobile phone

The invention provides a gesture identity authentication system and method based on a sensor on a mobile phone, and relates to the field of identity authentication based on sensors on mobile phones. The gesture identity authentication system comprises an acceleration sensor used for recording real-time acceleration information of a user gesture in a moving process; a direction sensor used for recording azimuth angle information of the user gesture in the moving process; a preprocessing module used for carrying out filtering denoising and equal frequency sampling on the information recorded in the acceleration sensor and the direction sensor; a calculation module used for respectively calculating matching scores of the acceleration information and the azimuth angle information via a global sequence alignment algorithm, calculating a threshold through the matching scores and gesture information made by the user again, and then comparing the user gesture information input at each time with the threshold; and a template base module used for storing original samples of all user gestures and storing the matching scores and the threshold calculated by the calculation module. The gesture identity authentication system provided by the invention adopts no additional device to serve as support, is scarcely influenced by environmental factors and is safe and convenient to carry out identity authentication of the user on the mobile phone.
Owner:WUHAN UNIV OF TECH +1

Flexible distributed sequence alignment system and method based on Spark and SIMD

The invention discloses a flexible distributed sequence alignment system based on Spark and SIMD. The system includes a master node and multiple working nodes connected to the master node; the master node is used for management of metadata and clusters and includes a master node body based on the distributed type computational frame Spark, a master node body based on a distributed type memory file system and a master node body of a Hadoop distributed type file system; the working nodes are used for data storage and calculation and includes a storage layer and a calculation layer; the storage layer includes Alluxio and HDFS, the calculation layer includes the Spark and an SIMD instruction set, and according to the distributed type computational frame Spark, a sequence alignment algorithm based on the SIMD is called through a mediation module for sequence alignment. The Alluxio and the HDFS are used for distributed storage of data, the Spark is used for distributed type calculation, the SIMD technology is adopted at each node for sequence alignment, and performance is improved.
Owner:UNIV OF SCI & TECH OF CHINA

Industrial control protocol reverse analysis method based on active learning

The invention discloses an industrial control protocol reverse analysis method based on active learning. The method comprises the steps of importing, preliminary analysis, variation, matching and merging. According to the method, an industrial control protocol pcap message sample is subjected to preliminary analysis; a partial message format and a state machine of an industrial control protocol are mastered;and then, interactive active learning is carried out with the industrial personal computer by utilizing the result to continuouslyobtain new messages, so that protocol individual lexical methods and grammars can be deduced more accurately and completely; a Needleman-Wunsch sequence alignment algorithm is adopted when reverse analysis is carried out on the protocol; according to the algorithm, a format and a state machine of a protocol are deduced through similarity scoring and optimal backtracking steps; the method is advantaged in that accuracy of the analysis result is effectivelyguaranteed, through combination with the active learning process, the response message is matched with the protocol formats in the preliminary analysis result, whether the message is matched with theprotocol formats is determined, repeated matching is carried out according to demands, and reverse accuracy and coverage of the industrial control protocol are substantially improved.
Owner:NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT +1

Industrial control protocol reverse analysis method based on semantic pre-mining

The invention discloses an industrial control protocol reverse analysis method based on semantic pre-mining, which realizes optimization of an industrial control data sample protocol reverse analysisresult by pre-mining semantics such as timestamps, lengths, serial numbers and the like and then carrying out field division before protocol format reverse analysis is carried out. The basic idea of the method is that the method comprises the steps: when protocol format analysis is performed on a target industrial control data sample, clustering a sample set to be analyzed according to the lengthof a message, analyzing whether fields such as timestamps, lengths and serial numbers exist in different types of messages, and replacing discovered semantic fields with wildcard characters; after semantic pre-analysis is completed, adopting a Needleman-Wunsch sequence alignment algorithm to analyze the data sample; and finally, replacing the semantic result obtained by pre-analysis in the analysis result, so the accuracy of the analysis result is improved. The method has the advantages of accurate analysis result, high semantic recognition rate and the like.
Owner:ZHEJIANG SHUREN COLLEGE ZHEJIANG SHUREN UNIV

Re-sequencing sequence alignment method based on Spark framework

The invention relates to the technical field of computer science and bioinformatics, and particularly to a re-sequencing sequence alignment method based on Spark framework. The method comprises threesteps of a RDDs creating step, a Map step and a Reduce step. The corresponding RDDs are created based on the FASTQ file and are stored in an HDFS. Then a sequence alignment algorithm of a BWA is applied on each RDDs. Furthermore the RDDs perform multi-node mapping. Finally whether to execute a final combining step is determined according to a processing requirement. According to the method, a sequence alignment BWA which is used in a re-sequencing step is integrated in a Spark big data processing frame, and re-sequencing procedure optimization is finished in a distributed calculation manner, thereby effectively improving re-sequencing data analysis efficiency.
Owner:SHENZHEN INST OF ADVANCED TECH

Method for extracting information from error OCR result

The invention is applicable to the technical field of image text processing, and provides a method for extracting information from an error OCR result, which comprises the following steps of: obtaining a result of extracting an image text through OCR; carrying out post-processing on the OCR results, and merging the OCR results into rows; defining an extraction template according to an information extraction target; carrying out fuzzy matching on a template and all OCR lines by utilizing an optimized global sequence alignment algorithm; optimizing a matching alignment result by utilizing a character library with a similar shape; extracting target information according to a matching alignment result. Meanwhile, the invention further provides a method for generating the similar character library through the neural network recognition model, by means of the similar character library, information provided by wrong characters in OCR recognition can be more effectively utilized, and the information extraction precision is improved. Compared with the prior art, the information extraction method provided by the invention has the advantages that the problem of OCR result error can be effectively solved, and the information extraction effect under the conditions of missing characters, multiple characters and wrong characters is greatly improved.
Owner:上海兑观信息科技技术有限公司

High-concurrency sequence alignment calculation acceleration method based on CPU + GPU isomerism

The invention discloses a high-concurrency sequence alignment calculation acceleration method based on CPU + GPU isomerism. The method comprises the following steps: reconstructing BWA-MEM algorithm codes; performing task concurrent processing on the CPU: completing division of a sequence set, and forming a plurality of concurrent tasks for the first time; running the BWA-MEM algorithm after code reconstruction, and completing concurrent processing of data on the GPU; and task concurrent processing on the GPU: for seed sets and chains generated in the sequence data comparison process, dividing the seed sets with the same or adjacent length, position and quantity into the same data block and chain, and performing the same processing, thereby completing the division of the seed sets and the chains, and forming a plurality of concurrent tasks for the second time. According to the method, the characteristics of the BWA-MEM algorithm and the characteristics of GPU acceleration equipment are closely combined by designing a task parallel and data parallel mode, the strong concurrent operation capability of the GPU is fully utilized, excellent performance is provided for a sequence alignment algorithm, and the efficiency of high-concurrent processing is higher.
Owner:GUANGZHOU JIAJIAN MEDICAL TESTING CO LTD

Third Generation Sequencing Alignment Algorithm

Methods, software, and systems for aligning a read sequence to a reference sequence are disclosed. In certain embodiments, the methods, software, and systems involve determining similarity of distribution of k-mers between a region of the read sequence and a region of the reference sequence in order to determine whether the region of the read sequence maps to the region of the reference sequence.
Owner:THE BOARD OF TRUSTEES OF THE LELAND STANFORD JUNIOR UNIV

Method and system for optimizing multiple sequence alignment algorithms, and storage medium

The invention relates to a method and a system for optimizing multiple sequence alignment algorithms, and a storage medium. The method comprises the steps of selecting a core sequence from multiple sequences; performing pairwise alignment on the core sequence and other sequences in the multiple sequences, and obtaining the number of common fragments of the sequences; constructing a first guiding tree according to the number of common fragments of the pairwise sequences; performing a progressive algorithm on the first guiding tree for obtaining a first result through alignment of multiple sequences; calculating the distance between the pairwise sequences according to the first result, and obtaining a distance matrix; constructing a second guiding tree according to the distance matrix, comparing the first guiding tree with the second guiding tree, performing re-alignment on the sequences which correspond with the changing part for obtaining a second result, and repeating processes of constructing the second guiding tree and comparing the first guiding tree with the second guiding tree until the number of comparison times exceeds a threshold, thereby shortening time consumption in sequence comparison, increasing processing process and reducing resource consumption.
Owner:INST OF SPECIAL ANIMAL & PLANT SCI OF CAAS

Sequence alignment Seed processing method, system and device and readable storage medium

The invention discloses a sequence alignment Seed processing method, system and device and a computer readable storage medium. The method comprises the steps: according to the to-be-compared sequenceposition of the Seeds on a to-be-compared sequence and the candidate comparison position of the Seeds on the reference sequence, determining the linear Seeds with the consistent relative relationshipbetween the two positions of the Seeds; splicing the linear Seeds to obtain a new spliced Seed; screening out the longest Seed covering the longest base of the same base fragment of the to-be-comparedsequence from a Seed set comprising the spliced Seeds and nonlinear Seeds; further screening out the Seed which covers the target basic group fragment in each target basic group fragment on the to-be-compared sequence and of which the termination position is greater than the invalid Seed from the Seed set; synthesizing the target Seed of each target base fragment to obtain a target Seed set, wherein the target Seed set does not include Seeds in the longest Seed set, and the number of Seeds used when a subsequent sequence alignment algorithm is expanded is comprehensively reduced, so the calculated amount of an alignment system is reduced, and the matching precision and the processing performance of gene sequence alignment are improved.
Owner:LANGCHAO ELECTRONIC INFORMATION IND CO LTD

Method for realizing trajectory data release k-anonymity based on point density segmentation trajectory

The invention discloses a method for realizing trajectory data release k- anonymity based on a point density segmentation trajectory. The method comprises the following steps: 1) acquiring basic trajectory data, and establishing a trajectory data set model; 2) establishing a trajectory loss model DGH tree; (3) adding virtual points into the trajectory data set model, and generating trajectory data set models containing the virtual points and a virtual point mark data set model; 4) clustering the trajectory data set models containing the virtual points, marking a clustering center to which each point belongs, and generating a mark data set model; 5) traversing the trajectory data set models, segmenting the trajectory through the mark data set model, and generating a segmented trajectory data set model; and 6) for the segmented data set model, using a dynamic sequence alignment algorithm to calculate loss, and then using an iterative trajectory k anonymous clustering algorithm to perform clustering based on information loss. According to the method, the trajectory is segmented based on the point density of the trajectory data set, and the information loss caused in the k-anonymity process is reduced.
Owner:SOUTH CHINA UNIV OF TECH

A Method for Classifying Tor Anonymous Communication Traffic Applications

The invention discloses a method of application classification in Tor anonymous communication flow, which mainly solves the problem of acquisition of upper-layer application type information in the Tor anonymous communication flow and relates to the correlation technique, such as feature selection, sampling preprocessing and flow modeling. The method comprises the following steps of: firstly, defining a concept of a flow burst section by utilizing a data packet scheduling mechanism of Tor, and serving a volume value and a direction of the flow burst section as classification features; secondly, preprocessing a data sample based on a K-means clustering algorithm and a multiple sequence alignment algorithm, and solving the problems of over-fitting and inconsistent length of the data sample through the manners of value symbolization and gap insertion; and lastly, respectively modeling uplink Tor anonymous communication flow and downlink Tor anonymous communication flow of different applications by utilizing a Profile hidden Markov model, providing a heuristic algorithm to establish the Profile hidden Markov model quickly, during specific classification, substituting features of network flow to be classified into the Profile hidden Markov models of different applications, respectively figuring up probabilities corresponding to an uplink flow model and a downlink flow model, and deciding the upper-layer application type included by the Tor anonymous communication flow to be classified through a maximum joint probability value.
Owner:南京市公安局
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products