Method and system for generating summary data of biological gene sequence
A gene sequence and summary data technology, applied in the field of biological data processing, can solve problems such as limited performance and lack of solution methods, and achieve the effects of reducing dependencies, improving program performance, and avoiding prediction failures
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0053] Embodiment 1 of the present disclosure provides a method for generating summary data of biological gene sequences. For the calculation of hash values, this embodiment provides a variety of improved hash functions based on SIMD, including MurmurHash3, CityHash, xxHash, wangHash, use these hash functions to construct the hash value list of the gene sequence, and choose different hash functions for different situations, so that the applicability is wider; the implementation method of vectorization is used to make it faster.
[0054] The original hash function calculates the hash value as follows:
[0055] For the sequence data to be processed, use a sliding window to generate a K-mer, and then process the K-mer to obtain its reverse complementary strand (DNA generally presents a double-stranded structure, which is formed by two single strands coiled, and the two A single strand has complementary characteristics, that is, base pairs are formed between every two bases. This ...
Embodiment 2
[0099] Embodiment 2 of the present disclosure provides a system for generating summary data of biological gene sequences, including the following process:
[0100] The data acquisition module is configured to: acquire the gene sequence to be processed;
[0101] The K-mer decomposition module is configured to: perform K-mer decomposition on the gene sequence to be processed using a sliding window, cut out a fixed-length K-mer in sequence each time, and obtain the reverse complementary chain of the gene sequence, Encapsulate the M K-mers and the K-mers of their reverse complementary chains into vectors respectively, and use the binary mask form to compare the forward K-mer and the reverse K-mer in a vectorized manner, that is, for For each pair of forward and reverse K-mers, select the K-mer with a smaller character value, and finally leave M K-mers with a smaller character value, and vectorize the remaining M K-mers setting operation;
[0102] The hash calculation module is c...
Embodiment 3
[0107] Embodiment 3 of the present disclosure provides a computer-readable storage medium on which a program is stored. When the program is executed by a processor, the steps in the method for generating summary data of biological gene sequences as described in Embodiment 1 of the present disclosure are implemented.
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com