A RefGene gene library annotation method and device based on Spark SQL

By generating interval forests in the Spark SQL engine and using broadcasting and inner join operations, the mutation annotation process of the RefGene database is optimized, solving the problem of low efficiency in exon annotation in existing technologies and achieving efficient mutation annotation.

CN122201459APending Publication Date: 2026-06-12XIAN UNIV OF POSTS & TELECOMM

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
XIAN UNIV OF POSTS & TELECOMM
Filing Date
2024-12-12
Publication Date
2026-06-12

Smart Images

  • Figure CN122201459A_ABST
    Figure CN122201459A_ABST
Patent Text Reader

Abstract

Embodiments of the present application provide a Spark SQL-based annotation method for a RefGene gene library. The method comprises: generating an interval forest distinguished by positive and negative strands of chromosomes; broadcasting the generated interval forest using a broadcast mechanism of a distributed SQL query engine; querying the interval forest using a specified field of a to-be-annotated variation; performing an inner-join operation on the returned query result and the to-be-annotated table; and outputting a table containing an annotation result. After performing the process once, if the returned annotation result is not empty, the process is performed a second time to annotate whether the variation belongs to an exon. In addition, embodiments of the present application provide a Spark SQL-based annotation device for a RefGene gene library.
Need to check novelty before this filing date? Find Prior Art