A method for screening differential level gene transfer
By screening differential gene transfer methods and using the Wilcoxon rank-sum test with the scipy module of Python, differential genera pairs and HGT breakpoints were identified, which solved the problem that the dynamic relationships between species were difficult to reflect and achieved more accurate differential identification.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- SHENZHEN BAIREN TECH CO LTD
- Filing Date
- 2023-03-24
- Publication Date
- 2026-06-19
AI Technical Summary
Existing technologies are insufficient to effectively reflect the dynamic relationships and interactions among human microbial species, especially in metagenomics, where features such as differential species, differential genes, differential metabolic pathways, and differential SNVs are difficult to reflect the dynamic relationships and interactions among species.
A method for screening differentially expressed gene transfer was adopted. By collecting and filtering HGT events in the samples, gene exchange pairs were collected, and Wilcoxon rank-sum test was performed using the scipy module of Python to screen differentially expressed gene pairs. Differential taxonomic pairs sharing HGT breakpoints were identified, and differential HGT breakpoints were identified by sorting in ascending order using p-values.
It effectively reflects the dynamic relationships and interactions between species, improving the accuracy and efficiency of differential HGT identification.
Smart Images

Figure CN116312812B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to the field of horizontal gene transfer technology, specifically a method for screening differential horizontal gene transfer. Background Technology
[0002] To explore the association between metagenomics and human phenotypes, case-control studies involve researchers searching for differentially expressed species, genes, metabolic pathways, and surface-level viruses (SNVs) within the metagenomic genome. These differential biomarkers play a crucial role in identifying differences between study groups and in finding associations between phenotypes and metagenomics. This approach has been used to discover a variety of pathogenic bacteria or strains, which is significant for understanding disease mechanisms and even treatment methods.
[0003] The human microbiome is a complex and dynamic system with frequent gene exchange among bacteria, resulting in constant changes in their genetic composition. However, characteristics such as species differences, gene differences, metabolic pathway differences, and SNV differences are insufficient to reflect the dynamic relationships and interactions between species. Horizontal gene transfer (HGT) refers to the transfer of gene segments between microbial species, linking different species together by transmitting genetic information.
[0004] Therefore, we propose a method for screening differentially expressed gene transfer. Summary of the Invention
[0005] The purpose of this invention is to provide a method for screening differentially expressed gene transfers to solve the problems mentioned in the background art.
[0006] To achieve the above objectives, the present invention provides the following technical solution: a method for screening differentially expressed gene transfer, comprising the following steps:
[0007] S1: Collect and filter all HGT events in the sample;
[0008] S2: Collect genus pairs that have exchanged genes;
[0009] S3: Filter for differences in pairs.
[0010] Preferably, in S1, the genome is divided into intervals of 100 base pairs in length, and the index of the located interval is used to indicate the position of the breakpoint.
[0011] Preferably, in S2, differential taxon pairs sharing at least one HGT breakpoint pair are identified, and genomes with HGTs are annotated as different taxonomic levels. A taxon pair represents the presence of at least one HGT between the genomes of the two taxons, and then two binary vectors are constructed for each genus pair to record whether it is present in the sample and control, respectively.
[0012] Preferably, in step S3, the two vectors are compared using the Wilcoxon rank-sum test using the scipy module in Python, and the HGTs are sorted according to the ascending order of the P-values to select the differential HGTs.
[0013] Preferably, the class pairs with P values less than 0.05 are selected as differential class pairs, and the HGT breakpoint pairs with P values less than 0.05 are represented as differential HGT breakpoints.
[0014] Preferably, these taxa can be annotated at any taxonomic level.
[0015] Compared with existing technologies, the beneficial effects of this invention are: This invention relates to a method for screening differentially expressed gene transfer, and the proposed method for finding differentially expressed gene transfer can reflect the dynamic relationships and interactions between species. This invention includes selecting differentially expressed HGTs, identifying differential taxon pairs sharing at least one HGT breakpoint pair, comparing these two vectors with a Wilcoxon rank-sum test using the scipy module in Python, and sorting the taxon pairs in ascending order based on p-values to determine the differentially expressed taxon pairs. Simultaneously, we also search for differentially expressed HGT breakpoints using a similar method. Attached Figure Description
[0016] Figure 1 This is a schematic diagram of the process of the present invention. Detailed Implementation
[0017] Example 1
[0018] Please see Figure 1 The present invention provides a technical solution:
[0019] A method for screening differentially expressed gene transfer includes the following steps:
[0020] S1: Collect and filter all HGT events in the sample;
[0021] S2: Collect genus pairs that have exchanged genes;
[0022] S3: Filter for differences in pairs.
[0023] in,
[0024] S1: Theoretically, each HGT (Hyper-Gross Transmission) event generates two pairs of breakpoints; therefore, the HGT breakpoint pair count reflects the number of HGT events. Since the breakpoint locations identified by small genomic variations or sequencing errors may be biased, the same breakpoint detected in different samples may have different locations. We divide the genome into 100-base-pair intervals, and the index of the interval is used to indicate the breakpoint location. After collecting HGT breakpoint pairs from all samples, we remove non-compliant breakpoint pairs from each sample. This gives us the HGT breakpoints that meet the requirements.
[0025] S2: Identify differential taxonomic pairs that share at least one HGT breakpoint pair. Genomes with HGTs are annotated to different taxonomic levels. A taxonomic pair represents the presence of at least one HGT between the genomes of two taxa. For example, if at least one HGT breakpoint pair is detected between two species in a sample, then the sample contains such a species pair. Using this method, we can count the number of genus pairs in each sample. Two binary vectors are then constructed for each genus pair to record whether it is present in the sample and control, respectively.
[0026] S3: Using the scipy module in Python, we compared the two vectors using the Wilcoxon rank-sum test and sorted the HGTs according to the ascending order of the p-values to select the differential HGTs. The Wilcoxon rank-sum test is based on the rank sum of sample data. It first treats the two samples as single samples (mixed samples) and then ranks the observations uniformly from smallest to largest. If the null hypothesis that the two independent samples come from the same population is true, then the ranks will be evenly distributed between the two samples; that is, small, medium, and large rank values should be evenly distributed between the two samples. If the alternative hypothesis that the two independent samples come from different populations is true, then one sample will have more small rank values, resulting in a smaller rank sum; the other sample will have more large rank values, resulting in a larger rank sum. We selected the taxonomic pairs with p-values less than 0.05 as differential taxonomic pairs. These taxonomic groups can be annotated at any taxonomic level. Since many genomes are not annotated at the species level in the UHGG database, our focus in this work is on identifying and analyzing differential genera pairs. The HGT breakpoint location can be represented by an index containing the interval of that breakpoint. Similar to finding differential taxon pairs, we used the Wilcoxon rank-sum test to compare the occurrence of HGT breakpoint pairs in the sample and control, and then identified HGT breakpoint pairs with a p-value less than 0.05 as differential HGT breakpoints.
Claims
1. A method for screening differentially expressed gene transfer, characterized in that: Includes the following steps: S1: Collect and filter all HGT events in the sample; S2: Collect genus pairs that have exchanged genes; S3: Filter for differential pairs; Specifically, by dividing the genome into 100-base-pair intervals, the index of the interval is used to indicate the location of the breakpoint. Differential taxonomic pairs sharing at least one HGT breakpoint pair are identified. Genomes with HGTs are annotated as different taxonomic levels. A taxonomic pair represents at least one HGT appearing between the genomes of two taxonomic groups. Then, two binary vectors are constructed for each genus pair to record whether it is present in the sample and control, respectively. The two vectors are compared using the Wilcoxon rank-sum test in the scipy module of Python, and the HGTs are sorted in ascending order according to the p-value to select differential HGTs.
2. The method of claim 1, wherein the method is for screening differential horizontal gene transfer. The class pairs with P values less than 0.05 are selected as differential class pairs, and the HGT breakpoint pairs with P values less than 0.05 are represented as differential HGT breakpoints.
3. The method of claim 2, wherein the method is for screening differential horizontal gene transfer. These taxa can be annotated at any taxonomic level.