A method for screening differential level gene transfer

By screening differential gene transfer methods and using the Wilcoxon rank-sum test with the scipy module of Python, differential genera pairs and HGT breakpoints were identified, which solved the problem that the dynamic relationships between species were difficult to reflect and achieved more accurate differential identification.

CN116312812BActive Publication Date: 2026-06-19SHENZHEN BAIREN TECH CO LTD

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Patents(China)
Current Assignee / Owner
SHENZHEN BAIREN TECH CO LTD
Filing Date
2023-03-24
Publication Date
2026-06-19

AI Technical Summary

Technical Problem

Existing technologies are insufficient to effectively reflect the dynamic relationships and interactions among human microbial species, especially in metagenomics, where features such as differential species, differential genes, differential metabolic pathways, and differential SNVs are difficult to reflect the dynamic relationships and interactions among species.

Method used

A method for screening differentially expressed gene transfer was adopted. By collecting and filtering HGT events in the samples, gene exchange pairs were collected, and Wilcoxon rank-sum test was performed using the scipy module of Python to screen differentially expressed gene pairs. Differential taxonomic pairs sharing HGT breakpoints were identified, and differential HGT breakpoints were identified by sorting in ascending order using p-values.

Benefits of technology

It effectively reflects the dynamic relationships and interactions between species, improving the accuracy and efficiency of differential HGT identification.

✦ Generated by Eureka AI based on patent content.

Smart Images

  • Figure CN116312812B_ABST
    Figure CN116312812B_ABST
Patent Text Reader

Abstract

This invention discloses a method for screening differentially expressed gene transfers in the field of horizontal gene transfer technology, comprising the following steps: S1: collecting and filtering all HGT events in the sample; S2: collecting genus pairs that have exchanged genes; S3: screening differentially expressed genus pairs. The beneficial effects of this invention are: this invention relates to a method for screening differentially expressed gene transfers, and the proposed method for finding differentially expressed gene transfers can reflect the dynamic relationships and interactions between species. This invention includes selecting differentially expressed HGTs, identifying differentially expressed taxonomic pairs sharing at least one HGT breakpoint, comparing these two vectors with the Wilcoxon rank-sum test using the scipy module in Python, and sorting the taxonomic pairs in ascending order based on the p-value to determine the differentially expressed taxonomic pairs. Simultaneously, we also search for differentially expressed HGT breakpoints using a similar method.
Need to check novelty before this filing date? Find Prior Art

Description

Technical Field

[0001] This invention relates to the field of horizontal gene transfer technology, specifically a method for screening differential horizontal gene transfer. Background Technology

[0002] To explore the association between metagenomics and human phenotypes, case-control studies involve researchers searching for differentially expressed species, genes, metabolic pathways, and surface-level viruses (SNVs) within the metagenomic genome. These differential biomarkers play a crucial role in identifying differences between study groups and in finding associations between phenotypes and metagenomics. This approach has been used to discover a variety of pathogenic bacteria or strains, which is significant for understanding disease mechanisms and even treatment methods.

[0003] The human microbiome is a complex and dynamic system with frequent gene exchange among bacteria, resulting in constant changes in their genetic composition. However, characteristics such as species differences, gene differences, metabolic pathway differences, and SNV differences are insufficient to reflect the dynamic relationships and interactions between species. Horizontal gene transfer (HGT) refers to the transfer of gene segments between microbial species, linking different species together by transmitting genetic information.

[0004] Therefore, we propose a method for screening differentially expressed gene transfer. Summary of the Invention

[0005] The purpose of this invention is to provide a method for screening differentially expressed gene transfers to solve the problems mentioned in the background art.

[0006] To achieve the above objectives, the present invention provides the following technical solution: a method for screening differentially expressed gene transfer, comprising the following steps:

[0007] S1: Collect and filter all HGT events in the sample;

[0008] S2: Collect genus pairs that have exchanged genes;

[0009] S3: Filter for differences in pairs.

[0010] Preferably, in S1, the genome is divided into intervals of 100 base pairs in length, and the index of the located interval is used to indicate the position of the breakpoint.

[0011] Preferably, in S2, differential taxon pairs sharing at least one HGT breakpoint pair are identified, and genomes with HGTs are annotated as different taxonomic levels. A taxon pair represents the presence of at least one HGT between the genomes of the two taxons, and then two binary vectors are constructed for each genus pair to record whether it is present in the sample and control, respectively.

[0012] Preferably, in step S3, the two vectors are compared using the Wilcoxon rank-sum test using the scipy module in Python, and the HGTs are sorted according to the ascending order of the P-values ​​to select the differential HGTs.

[0013] Preferably, the class pairs with P values ​​less than 0.05 are selected as differential class pairs, and the HGT breakpoint pairs with P values ​​less than 0.05 are represented as differential HGT breakpoints.

[0014] Preferably, these taxa can be annotated at any taxonomic level.

[0015] Compared with existing technologies, the beneficial effects of this invention are: This invention relates to a method for screening differentially expressed gene transfer, and the proposed method for finding differentially expressed gene transfer can reflect the dynamic relationships and interactions between species. This invention includes selecting differentially expressed HGTs, identifying differential taxon pairs sharing at least one HGT breakpoint pair, comparing these two vectors with a Wilcoxon rank-sum test using the scipy module in Python, and sorting the taxon pairs in ascending order based on p-values ​​to determine the differentially expressed taxon pairs. Simultaneously, we also search for differentially expressed HGT breakpoints using a similar method. Attached Figure Description

[0016] Figure 1 This is a schematic diagram of the process of the present invention. Detailed Implementation

[0017] Example 1

[0018] Please see Figure 1 The present invention provides a technical solution:

[0019] A method for screening differentially expressed gene transfer includes the following steps:

[0020] S1: Collect and filter all HGT events in the sample;

[0021] S2: Collect genus pairs that have exchanged genes;

[0022] S3: Filter for differences in pairs.

[0023] in,

[0024] S1: Theoretically, each HGT (Hyper-Gross Transmission) event generates two pairs of breakpoints; therefore, the HGT breakpoint pair count reflects the number of HGT events. Since the breakpoint locations identified by small genomic variations or sequencing errors may be biased, the same breakpoint detected in different samples may have different locations. We divide the genome into 100-base-pair intervals, and the index of the interval is used to indicate the breakpoint location. After collecting HGT breakpoint pairs from all samples, we remove non-compliant breakpoint pairs from each sample. This gives us the HGT breakpoints that meet the requirements.

[0025] S2: Identify differential taxonomic pairs that share at least one HGT breakpoint pair. Genomes with HGTs are annotated to different taxonomic levels. A taxonomic pair represents the presence of at least one HGT between the genomes of two taxa. For example, if at least one HGT breakpoint pair is detected between two species in a sample, then the sample contains such a species pair. Using this method, we can count the number of genus pairs in each sample. Two binary vectors are then constructed for each genus pair to record whether it is present in the sample and control, respectively.

[0026] S3: Using the scipy module in Python, we compared the two vectors using the Wilcoxon rank-sum test and sorted the HGTs according to the ascending order of the p-values ​​to select the differential HGTs. The Wilcoxon rank-sum test is based on the rank sum of sample data. It first treats the two samples as single samples (mixed samples) and then ranks the observations uniformly from smallest to largest. If the null hypothesis that the two independent samples come from the same population is true, then the ranks will be evenly distributed between the two samples; that is, small, medium, and large rank values ​​should be evenly distributed between the two samples. If the alternative hypothesis that the two independent samples come from different populations is true, then one sample will have more small rank values, resulting in a smaller rank sum; the other sample will have more large rank values, resulting in a larger rank sum. We selected the taxonomic pairs with p-values ​​less than 0.05 as differential taxonomic pairs. These taxonomic groups can be annotated at any taxonomic level. Since many genomes are not annotated at the species level in the UHGG database, our focus in this work is on identifying and analyzing differential genera pairs. The HGT breakpoint location can be represented by an index containing the interval of that breakpoint. Similar to finding differential taxon pairs, we used the Wilcoxon rank-sum test to compare the occurrence of HGT breakpoint pairs in the sample and control, and then identified HGT breakpoint pairs with a p-value less than 0.05 as differential HGT breakpoints.

Claims

1. A method for screening differentially expressed gene transfer, characterized in that: Includes the following steps: S1: Collect and filter all HGT events in the sample; S2: Collect genus pairs that have exchanged genes; S3: Filter for differential pairs; Specifically, by dividing the genome into 100-base-pair intervals, the index of the interval is used to indicate the location of the breakpoint. Differential taxonomic pairs sharing at least one HGT breakpoint pair are identified. Genomes with HGTs are annotated as different taxonomic levels. A taxonomic pair represents at least one HGT appearing between the genomes of two taxonomic groups. Then, two binary vectors are constructed for each genus pair to record whether it is present in the sample and control, respectively. The two vectors are compared using the Wilcoxon rank-sum test in the scipy module of Python, and the HGTs are sorted in ascending order according to the p-value to select differential HGTs.

2. The method of claim 1, wherein the method is for screening differential horizontal gene transfer. The class pairs with P values ​​less than 0.05 are selected as differential class pairs, and the HGT breakpoint pairs with P values ​​less than 0.05 are represented as differential HGT breakpoints.

3. The method of claim 2, wherein the method is for screening differential horizontal gene transfer. These taxa can be annotated at any taxonomic level.