A RefGene gene library annotation method and device based on Spark SQL
By generating interval forests in the Spark SQL engine and using broadcasting and inner join operations, the mutation annotation process of the RefGene database is optimized, solving the problem of low efficiency in exon annotation in existing technologies and achieving efficient mutation annotation.
CN122201459APending Publication Date: 2026-06-12XIAN UNIV OF POSTS & TELECOMM
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Applications(China)
- Current Assignee / Owner
- XIAN UNIV OF POSTS & TELECOMM
- Filing Date
- 2024-12-12
- Publication Date
- 2026-06-12
Smart Images

Figure CN122201459A_ABST
Abstract
Embodiments of the present application provide a Spark SQL-based annotation method for a RefGene gene library. The method comprises: generating an interval forest distinguished by positive and negative strands of chromosomes; broadcasting the generated interval forest using a broadcast mechanism of a distributed SQL query engine; querying the interval forest using a specified field of a to-be-annotated variation; performing an inner-join operation on the returned query result and the to-be-annotated table; and outputting a table containing an annotation result. After performing the process once, if the returned annotation result is not empty, the process is performed a second time to annotate whether the variation belongs to an exon. In addition, embodiments of the present application provide a Spark SQL-based annotation device for a RefGene gene library.
Need to check novelty before this filing date? Find Prior Art