An escherichia coli polysaccharide antigen structure database and online analysis platform thereof
By constructing the EcoSP-Db database and online analysis platform, the problem of difficulty in retrieving E. coli antigen structure and synthesis information in existing technologies has been solved, enabling rapid acquisition of polysaccharide synthesis information and genomic typing, supporting epidemiological surveys and vaccine development.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- HANGZHOU WEISHU BIOTECHNOLOGY CO LTD
- Filing Date
- 2022-08-16
- Publication Date
- 2026-06-19
Smart Images

Figure CN115440301B_ABST
Abstract
Description
Technical Field
[0001] This invention relates to fields such as epidemiological investigation and vaccine development, primarily to a database of structural information on Escherichia coli O and K antigens and an online platform for browsing, searching, and analyzing them. Background Technology
[0002] Escherichia coli is an opportunistic pathogen that can cause infectious diseases in humans and animals. For example, the E. coli strain O104:H4 caused a severe foodborne illness outbreak in Germany in 2011. Surface polysaccharide antigens are important virulence factors of E. coli and are also important targets for the development of drugs, vaccines, and diagnostic reagents.
[0003] Escherichia coli surface polysaccharide antigens, mainly including O antigen (cell wall lipopolysaccharide) and K antigen (capsular polysaccharide), possess pathogenicity and high immunogenicity. The variable O antigen, in particular, is the primary basis for serotyping, and O antigen typing serves as a fundamental tool for epidemiological surveys and surveillance. Both O and K antigens are composed of multiple oligosaccharide repeating units (O units or K units), each unit consisting of several glycosyl groups and glycosyl derivatives. Each E. coli O antigen or K antigen subgroup corresponds to a specific polysaccharide antigen structure and synthetic gene cluster, which includes monosaccharide synthesis genes, glycosyltransferases, and unit processing genes.
[0004] The polysaccharide structures, polysaccharide synthesis information, monosaccharide synthesis pathways, and genome-based antigen analysis of *E. coli* O and K antigens are crucial foundations for current research in vaccine development, glycobiology synthesis, and epidemiological surveys. The published *E. coli* O antigen database ECODAB (www.casper.organ.su.se / ECODAB / ) compiles antigen structures and some glycosyltransferase information for *E. coli* O antigen typing. The EK3D database (www.iith.ac.in / EK3D / ) compiles and publishes structural information on *E. coli* K antigens, including their three-dimensional structures. While these databases present polysaccharide antigen structure diagrams and synthetic gene clusters, they still have the following limitations:
[0005] 1) The lack of an integrated E. coli antigen structure database platform makes it impossible to quickly browse and retrieve the structures of O and K antigens and accurate synthetic gene cluster sequence information;
[0006] 2) There is a lack of database platforms that can quickly retrieve information on polysaccharide and monosaccharide synthesis pathways in E. coli antigen structures;
[0007] 3) There is a lack of database platforms that can be used to quickly retrieve published E. coli strain antigen typing and perform typing and functional annotation of unknown E. coli genomes. Summary of the Invention
[0008] To overcome the shortcomings of existing database platforms, this invention provides an Escherichia coli polysaccharide antigen structure database and analysis platform.
[0009] A database of Escherichia coli polysaccharide antigen structures was constructed using the following steps:
[0010] 1) E. coli genome acquisition, quality control and analysis
[0011] 1.1) Download all published E. coli genome sequences from the public database (NCBI RefSeq);
[0012] 1.2) The quality of the downloaded genomes was assessed. The average nucleotide similarity (ANI) between each downloaded genome and the genome of the E. coli model strain was calculated using FastANI software (version 1.3.3). At the same time, the genome integrity, contamination rate and heterogeneity were calculated using checkM software. The number of genome contigs or scaffolds was calculated using a self-written program in Perl language.
[0013] 1.3) Based on calculation and statistical results, only high-quality strain genome sequences are retained, while misnamed and low-quality genomes are removed.
[0014] 1.3.1) Remove genomes with an ANI value less than 94% of the model strain's genome;
[0015] 1.3.2) Remove genomes with a contamination rate greater than 5%;
[0016] 1.3.3) Remove genomes with a completeness rate of less than 85%;
[0017] 1.3.4) Remove genomes with more than 500 contigs or scaffolds;
[0018] 1.4) A self-written Perl program was used to perform antigen typing analysis on all genomes:
[0019] 1.4.1) Download the E. coli O antigen and H antigen typing gene cluster sequences as typing DNA reference sequences;
[0020] 1.4.2) The E. coli genome sequence was aligned to the reference sequence using Blast+ (2.11.0) software;
[0021] 1.4.3) Compare the similarity and coverage of the results to generate antigen typing results;
[0022] 1.4.4) Organize the genome information of each antigen type and the corresponding strain.
[0023] 2) Collection of Escherichia coli polysaccharide antigen information
[0024] 2.1) Based on published literature, download and collect the original published polysaccharide structure information and synthetic gene cluster information corresponding to 186 O antigen types (or subgroups) and 68 K antigen types of Escherichia coli, and correct the incorrectly cited typing structure information in existing literature.
[0025] 2.2) Based on published literature, collect and integrate information on polysaccharide synthesis related to Escherichia coli O antigen and K antigen, including key genes (glycosyltransferase and polymerase) in polysaccharide repeat units, donor sugar and acceptor sugar information;
[0026] Chemical representations of 254 antigenic polysaccharides were drawn using CSDB / SFNG and SVG tools, and the polysaccharide unit structures and key enzyme genes of antigens were visualized in a unified format.
[0027] Draw functional structure diagrams of each antigen synthesis gene cluster, and present the functional gene structures graphically according to functional classification.
[0028] 2.3) Based on published literature, information on monosaccharide synthesis pathways in Escherichia coli polysaccharide antigens was integrated, and information tables on monosaccharide synthesis reactions were compiled according to reaction steps, including information on key genes, reaction precursors, reaction products, and synthesis pathway types. SVG was used to draw diagrams of 39 monosaccharide synthesis pathways.
[0029] 3) Integrate the above E. coli genome, polysaccharide antigen structure and synthesis pathway information to construct the EcoSP-Db database, which is used to store diverse E. coli antigen structure data, including antigen typing, polysaccharide antigen structure, synthesis pathway and corresponding E. coli genome data, totaling 39 monosaccharide synthesis pathways, 911 polysaccharide synthesis information and 7741 high-quality genome information corresponding to 254 antigen typing.
[0030] An online analysis platform for Escherichia coli polysaccharide antigen structure data is constructed and applied as follows:
[0031] 1) Antigen typing browsing module
[0032] The E. coli polysaccharide antigen structure data browsing module presents a list of 186 O antigens and 68 K antigens. Clicking on any antigen type name will provide the user with the corresponding polysaccharide antigen unit structure diagram, synthetic gene cluster structure diagram and sequence, as well as the corresponding published strain information.
[0033] 2) Antigen typing search module
[0034] This tool is used to search the EcoSP-Db database for E. coli strain numbers or antigen typing names entered by the user. If a matching strain name or genome sequence number is found, it will provide the corresponding antigen typing, polysaccharide antigen structure information, and functional structure and sequence information of the synthetic gene cluster. It will also provide functional annotation information of polysaccharide synthesis-related genes (glycosyltransferases and polymerases) in the strain's genome. If the search for the user-entered antigen typing information is successful and a database match is found, it will provide the corresponding polysaccharide antigen structure, synthetic gene cluster, and published strain information.
[0035] 3) Polysaccharide synthesis information retrieval module
[0036] This tool is used to retrieve polysaccharide synthesis-related information entered by the user in the EcoSP-Db database, such as monosaccharide name and glycosidic bond type. If the user-entered information is found, it can provide feedback on the known polysaccharide synthesis information in E. coli, including antigen typing, strain information, glycosidic bond, glycosidic bond type, polymerase, glycosyltransferase, donor sugar or acceptor sugar.
[0037] 4) Monosaccharide synthesis pathway retrieval module
[0038] This tool is used to search for user-inputted monosaccharide names in the EcoSP-Db database. If the user-inputted monosaccharide information is found, it can provide one or more monosaccharide synthesis pathways, including information on the reaction precursors, reaction products, key genes, antigen typing, published strain genomes, monosaccharide synthesis classification, and synthesis pathway diagrams for each step of the reaction.
[0039] 5) User data upload module
[0040] This tool allows users to submit E. coli strain genome sequences (Fasta format files) online, and also allows them to enter their email address to receive analysis results.
[0041] 6) Analysis Module
[0042] This tool is used to perform antigen typing analysis on E. coli genome sequences submitted and uploaded by users. After the background program (eco-TYPEtool) completes the analysis, it returns the typing results to the user in a table.
[0043] 6.1) Obtain the gene clusters and annotation information of E. coli O antigen and H antigen synthesis from the literature, convert them into recognizable E. coli typing DNA reference sequences and their corresponding protein sequence formats, and generate an E. coli typing database (eco-TypeDb).
[0044] 6.2) User-uploaded E. coli genome sequences were aligned to the eco-TypeDb database using the Blast+ (2.11.0) program, and the alignment results were initially filtered using a threshold (evalue=1e-5);
[0045] 6.3) Based on similarity and coverage screening thresholds, obtain the E. coli typing DNA reference sequence that best matches the genome, and divide the results into three categories: perfect match (coverage and similarity are both 100%), high match (coverage ≥99% and similarity ≥95%), and low match (coverage <99% or similarity <95%), and obtain the corresponding alignment fragment information of the genome sequence to be typed;
[0046] 6.4) Using the tBlastn program, the genome to be genotyped is aligned to the corresponding protein sequence in the genotyping DNA reference sequence to obtain the best matching gene information;
[0047] 6.5) Finally, based on the above matching information, the O:H antigen typing results of the Escherichia coli strain genome are obtained;
[0048] 7) Online platform
[0049] The platform is built using the Django framework. The front-end pages are built using HTML5, JS, and CSS, while the back-end is programmed using Python, providing search and browsing entry points.
[0050] In summary, the present invention has the following beneficial effects:
[0051] (1) The structural data of Escherichia coli O antigen and K antigen integrated by EcoSP-Db of this invention presents the polysaccharide antigen structural information of 254 antigen types in a unified format, making it convenient for users to retrieve data information. At the same time, it corrects the published erroneous Escherichia coli polysaccharide antigen structural information.
[0052] (2) It can retrieve the typing information and polysaccharide antigen structure data of published Escherichia coli strains;
[0053] (3) Integrating Escherichia coli polysaccharide synthesis information and monosaccharide synthesis pathways, users can quickly search and obtain information, laying an important foundation for glycobiology synthesis and vaccine development;
[0054] (4) Integrating the E. coli antigen polysaccharide structure database and the online tool for E. coli genome O:H antigen typing facilitates antigen typing analysis of the genome by researchers without a bioinformatics background, providing a powerful technical tool for epidemiological surveys. Attached Figure Description
[0055] Figure 1This is a flowchart illustrating the process architecture of the Escherichia coli polysaccharide antigen structure database and online platform of this invention.
[0056] Figure 2 Browse the EcoSP Escherichia coli antigen typing (O1) page.
[0057] Figure 3 This is a search page for the polysaccharide antigen structure of EcoSP Escherichia coli strain (ATCC 25922).
[0058] Figure 4 This is the EcoSP page for retrieving information on E. coli polysaccharide synthesis (α-D-Man).
[0059] Figure 5 This is a search page for the monosaccharide synthesis pathway in the EcoSP Escherichia coli polysaccharide antigen structure.
[0060] Figure 6 This module allows EcoSP users to upload and perform typing analysis for E. coli. Detailed Implementation
[0061] To understand and implement this invention, the invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.
[0062] Example 1
[0063] like Figure 1 As shown, the steps for constructing the Escherichia coli EcoSP-Db database of the present invention are as follows:
[0064] 1) Obtain the published genome sequences of E. coli strains (Fast format) from the public database (NCBI). Use FastANI to calculate the average nucleotide similarity (ANI) between the strain genome sequence and the E. coli model strain. Use CheckM software to calculate the integrity and contamination rate of each genome. At the same time, calculate the number of genome fragments (Contig or Scafold). Remove genomes with an ANI value lower than 94%, a contamination rate greater than 5%, an integrity rate less than 85%, or a number of genome fragments greater than 500 to obtain high-quality E. coli strain genomes.
[0065] 2) Download the E. coli antigen synthesis gene cluster sequence, convert it into a recognizable genotyping database, and use Blast software to align the high-quality genome to the genotyping database (eco-TypeDb) for O:H antigen genotyping. The genotyping procedure (eco-TYPEtool) steps are as follows:
[0066] First, the E. coli genome sequences uploaded by users were aligned to the eco-TypeDb database using Blast+ (2.11.0) software, and the alignment results were initially filtered using a threshold (evalue=1e-5).
[0067] Secondly, thresholds such as similarity and coverage were used to obtain the E. coli typing DNA reference sequence that best matches the genome. The results were divided into three categories: perfect match (100% coverage and similarity), high match (≥99% coverage and ≥95% similarity), and low match (<99% coverage or <95% similarity). The corresponding alignment fragment information of the genome sequence to be typed was also obtained.
[0068] Next, the tBlastn program is used to align the genome to be genotyped to the corresponding protein sequence in the genotyping DNA reference sequence to obtain the best matching gene information;
[0069] Finally, perfect and high-match results were selected to obtain O:H antigen typing information for E. coli strain genomes, and a table of strain names, genome sequence names, and antigen typing mapping relationships was compiled.
[0070] 3) Based on public databases (NCBI, MetaCYC, and GTDB) and literature, download and integrate the polysaccharide structures, synthetic gene clusters, polysaccharide synthesis information (polymerases, glycosyltransferases and their corresponding glycosidic bonds, donor sugars and acceptor sugars), and monosaccharide synthesis information (monosaccharide synthesis steps, reaction precursors, reaction products, key enzymes, and final products) corresponding to 186 O antigen types and 68 K antigen types of E. coli.
[0071] 4) Based on the downloaded data, the chemical structure diagrams of polysaccharide units corresponding to the O antigen and K antigen typing of E. coli were drawn using CSDB / SFNG, and the information of key enzymes (glycosyltransferase and polymerase) corresponding to glycosidic bonds was added using SVG. At the same time, the structure diagrams of synthetic gene clusters and monosaccharide synthesis pathways of each antigen typing were drawn using SVG.
[0072] 5) Integrate the mapping relationships between E. coli antigen typing and strain name, genome sequence number, antigen typing, polysaccharide antigen structure, polysaccharide synthesis and monosaccharide synthesis pathways from steps 1), 2), 3), and 4) to construct the EcoSP-Db database.
[0073] The EcoSP-Db database stores 186 O antigen types and 68 K antigen types of E. coli, along with structural diagrams of 254 polysaccharide antigen units corresponding to each antigen type, 39 monosaccharide synthesis pathways (involving 156 monosaccharide synthesis genes, 91 monosaccharide reaction information, and 26 monosaccharides), 911 polysaccharide synthesis information entries (involving 184 O and K antigen polysaccharide unit synthesis information, 269 glycosyltransferase genes, 113 donor sugars, and 93 acceptor sugars), and genomic information of 7741 high-quality strains, allowing users to quickly browse and search.
[0074] Example 2
[0075] like Figure 1 and Figure 2 As shown, the antigen typing browsing function of EcoSP in this invention is as follows:
[0076] 1) In this embodiment, the database and online platform are accessed via a browser. By entering the database website address, EcoSP can be accessed.
[0077] 2) On the “Antigen” page, select the O antigen type, and the system will display a list of all 186 O antigen genotypes for the user.
[0078] 3) Click "O1" antigen typing to return the user's polysaccharide antigen structure information. Figure 2 The data includes polysaccharide unit structure diagrams (SPNG format, chemical structural formula, and CSDB text format), synthetic gene cluster structure diagrams, synthetic gene cluster sequences (Genbank format, click on O locus to go to the sequence page), and reference information.
[0079] Example 3
[0080] like Figure 1 and Figure 3 As shown, the antigen typing search function of EcoSP in this invention is as follows:
[0081] 1) The genome sequence (GCA_007844355.1) of the known Escherichia coli strain ATCC 25922 (serotype O6) was obtained from NCBI.
[0082] 2) On the EcoSP platform, on the “Search” page, select “E. coli strain” and enter the strain number “ATCC25922” to search;
[0083] 3) Based on user input data, the EcoSP-Db database is searched. This invention uses exact matching, which requires inputting the strain name according to the public database (NCBI).
[0084] 4) Matching complete, display the search results page to the user ( Figure 3 The data includes the strain's genome sequence number (GCA_007844355.1), serotype (O6:H1), polysaccharide structure diagram, synthetic gene cluster structure diagram, and references.
[0085] 5) If you enter the strain genome sequence name "GCA_007844355.1" in the "E.coli strain" search mode, the results page returned to the user will be consistent with the search results for strain number (ATCC 25922);
[0086] 6) If you select the “E. coli serotype” search mode, you can enter the antigen type name, such as O1, to search for the antigen type and obtain information such as the polysaccharide structure, synthetic gene cluster structure and sequence, and strain genome of the O1 antigen type.
[0087] Example 4
[0088] like Figure 1 and Figure 4 The polysaccharide synthesis information retrieval function of EcoSP in this invention:
[0089] 1) On the EcoSP platform polysaccharide synthesis information retrieval page, enter the monosaccharide information to be retrieved (α-D-Man).
[0090] 2) A self-written search program was used to match α-D-Man against the EcoSP-Db database for precise matching;
[0091] 3) Based on the exact match results, provide the user with a search results page ( Figure 4 This is a list of information on the synthesis of α-D-Man-related antigenic polysaccharides. The list includes antigen type, key enzyme name, glycosidic link, donor sugar, acceptor sugar, glycosidic link type, and functional annotation information.
[0092] 4) Above the search results is a "Search" input box. Based on the first search result, enter the glycosidic bond type (α1->3) to perform a second search. Figure 4 ).
[0093] 5) The secondary search results are automatically refreshed on the original page, and the results include antigen typing, key enzymes, glycosidic bonds, donor sugars, acceptor sugars and glycosidic bond types, and functional annotation information;
[0094] 6) If a user enters any field such as glycosidic bond type, antigen typing, or key enzyme for a search, the backend's self-written search program will also accurately match it to EcoSP-Db. When a match is successful, the results will be returned to the user on the automatically refreshed search page. The results include antigen typing, key enzyme name, donor sugar, acceptor sugar, glycosidic bond type, and functional annotation information, which can be used for a second search.
[0095] Example 5
[0096] like Figure 1 and Figure 5 The monosaccharide synthesis pathway retrieval function of EcoSP in this invention:
[0097] 1) Open the "Pathway" page on the EcoSP platform to directly browse information on 39 monosaccharide synthesis pathways in the database. Figure 5 It is divided into 26 pages, and you can select a page to browse;
[0098] 2) Monosaccharide synthesis pathways, including antigen typing, strain list, reaction number, key enzymes, reaction products, reaction steps, reaction precursors, monosaccharide synthesis classification information, and pathway diagrams drawn by SVG, such as the dTDP-L-Rha synthesis pathway (SynPath0001), which corresponds to a total of 74 antigen typings and 392 high-quality genomes of the corresponding strains. It involves 4 synthesis reactions and requires 4 key monosaccharide synthase genes, rmlA, rmlB, rmlC, and rmlD, to participate in the synthesis.
[0099] 3) The browsing page contains a search input box, which can accurately match any field such as monosaccharide name, key enzyme, reaction intermediate product, strain number, antigen typing, etc., to search for corresponding pathway information. You can enter the monosaccharide dTDP-6d-L-Rha.
[0100] 4) The results returned in the previous step are a total of one monosaccharide synthesis pathway (SynPath0002), with a total of 4 reactions, corresponding to 4 key synthetic genes, namely rmlA, rmlB, rmlC and tll, which is consistent with the monosaccharide synthesis pathway information in the published literature (DOI:10.1074 / jbc.275.10.6806).
[0101] Example 6
[0102] like Figure 1 and Figure 6 The typing analysis function of EcoSP in this invention:
[0103] 1) Obtain the publicly published genome sequence of Escherichia coli strain CFT073 (GCA_001675265.1), in Fasta format;
[0104] 2) Upload to the platform's "Analysis" page, enter the user's email address, and perform analysis using the self-written classification process:
[0105] First, the genome sequence was read, and the genome sequence was aligned to the E. coli typing database using Blast software (tBlastn).
[0106] The second step is to filter the results based on the expected value (1e-5) in the comparison results, and classify the results according to the comparison coverage and similarity values: 100% coverage and 100% similarity are considered perfect matches, 99% or more coverage and 95% or more similarity are considered high matches, and less than 99% coverage and less than 95% similarity are considered low matches.
[0107] The third step is to output the typing results table. Figure 6 Please go to the redirected page or the email address you provided, including the task submission serial number, species name (E. coli), and typing result (O6:H1).
[0108] 3) The analysis can simultaneously perform gene prediction and glycosyltransferase annotation based on eco-GTdb, and the annotation results return isotype results. The analysis steps are as follows:
[0109] Firstly, the Prodigal software tool was used to predict the genes in the strain genome;
[0110] The second step is to use the Blastop software to align the predicted genes to the eco-GTdb database;
[0111] The third step is to filter the comparison results, removing those with a similarity (Identity) of less than 85% and an e-Value greater than 1e-7.
[0112] The fourth step is to provide the annotation results to the user. The table page contains the location and functional name of the annotated gene, as well as information such as the corresponding glycosidic bonds, donor sugars, and acceptor sugars.
[0113] The embodiments described above are merely illustrative of several implementations of the present invention, and while the descriptions are specific and detailed, they should not be construed as limiting the scope of the invention. It should be noted that those skilled in the art can make various modifications and improvements without departing from the concept of the present invention, and these modifications and improvements all fall within the scope of protection of the present invention. Therefore, the scope of protection of the present invention should be determined by the appended claims.
Claims
1. A method for constructing a database of E. coli polysaccharide antigen structures, characterized by, The construction steps are as follows: S1 High-quality Escherichia coli strain genome acquisition and analysis S1-1: Download all published E. coli genome sequences from the public database NCBI; S1-2: Evaluate and analyze the downloaded genomes, including genome-based species identification and assessment of genome integrity and contamination rate; S1-3: Filter out low-quality and incorrectly named E. coli strain genomes and screen for high-quality strain genomes; S1-4: Genotyping and annotation analysis of high-quality Escherichia coli strains; Construction of the S2 Escherichia coli polysaccharide antigen structure database EcoSP-Db S2-1 Based on published literature, we collected and corrected the polysaccharide structure, synthetic gene cluster sequence and gene function information corresponding to 186 O antigen types and 68 K antigen types of Escherichia coli. S2-2 Based on literature, collect and integrate polysaccharide synthesis information from Escherichia coli O antigen and K antigen, including donor sugar and acceptor sugar information, and key genes glycosyltransferase and polymerase in polysaccharide repeat units; S2-3 Draw structural diagrams of the O unit, K unit, and functional structure of the synthetic gene cluster of the polysaccharide antigen; present them in a standardized format. Based on published literature, S2-4 collected and integrated 39 monosaccharide synthesis pathways in Escherichia coli polysaccharide antigens and drew a monosaccharide synthesis pathway diagram. The synthesis information is displayed step by step, including the monosaccharide synthesis reaction sequence number, key genes for synthesis, reaction precursors, reaction products, final products, synthesis pathway types and pathway diagrams, involving 156 monosaccharide synthesis genes, 91 monosaccharide reaction information and 26 monosaccharides. S2-5 summarizes 254 antigen types and their corresponding polysaccharide structures, functional structures of synthetic gene clusters, polysaccharide synthesis, monosaccharide synthesis pathways, and genome mapping relationships of high-quality strains in S1, and constructs the E. coli polysaccharide antigen structure database EcoSP-Db. The aforementioned E. coli polysaccharide antigen structure database, For the downloaded E. coli genome sequences, the FastANI software was used to calculate the average nucleotide similarity (ANI) between each downloaded genome and the genome of the E. coli model strain. At the same time, the checkM software was used to calculate the genome integrity, contamination rate and heterogeneity. A self-written program in Perl language was used to calculate the number of genome contigs or scaffolds. Exclude strains whose genomes have an ANI value of less than 94% with the Escherichia coli model strain, have more than 500 genome fragments, have a contamination rate of more than 5%, have an integrity of less than 85%, or do not contain polysaccharide antigen synthesis gene clusters. The aforementioned E. coli polysaccharide antigen structure database, The genotyping gene cluster sequence information published in the literature was obtained from the eco-TypeDb genotyping database. A self-written Python program was used for O antigen genotyping and H antigen genotyping. The steps for E. coli genome genotyping analysis are as follows: First, download the O antigen and H antigen synthesis gene cluster sequences and annotation information of each E. coli subtype published in the literature, convert them into recognizable E. coli subtype DNA reference sequences and corresponding protein sequences, and construct the eco-TypeDb database. Secondly, the genomic sequence was aligned to the E. coli typing DNA reference sequence using the Blast program, and the best matching DNA reference sequence was selected based on sequence similarity and coverage threshold. Next, the tBlastn program was used to align the gene to be genotyped to the corresponding protein sequence in the genotyping reference gene sequence to obtain the best matching gene information. Finally, based on the above matching information, the O:H antigen typing results of the E. coli strain genome are output in a table. We obtained Escherichia coli glycosyltransferase data from published literature, combined them with bacterial glycosyltransferase data from public databases NCBI, MetaCYC, and GTDB, constructed the annotation database eco-GTdb, and used Prodigal software to perform gene prediction on the strain genome. The predicted genes were aligned to the eco-GTdb database using Blastp software. The comparison results are filtered out, removing those with a similarity (Identity) of less than 85% and an e-Value greater than 1e-7; The aforementioned E. coli polysaccharide antigen structure database, The polysaccharide structures of 254 antigen types were drawn using the CSDB / SNFG drawing tool. The drawing formats included SNFG format, chemical structure diagrams and CSDB linear structure format. The SVG vector drawing method was used to add the key gene names corresponding to glycosidic bonds to the chemical structure diagrams to display the polysaccharide structure and synthesis information in a unified format and image. Based on synthetic gene cluster sequences and annotation information downloaded from literature and public databases, the SVG method was used to draw the synthetic gene cluster structure diagram and show the functional classification of each gene. We traced back to the original literature reports and checked the existing published polysaccharide structures and synthetic gene cluster information one by one, including correcting the citation issues of the O63 antigen synthetic gene cluster information; The aforementioned E. coli polysaccharide antigen structure database, Based on literature information, we summarized the information on Escherichia coli surface polysaccharide synthesis in various antigen types, and established the mapping relationship between key enzyme genes of Escherichia coli polysaccharide synthesis, glycosyltransferase and polymerase, donor sugar, acceptor sugar, glycosidic bond type, antigen type, and strain genome.
2. A method for constructing an online analysis platform for E. coli polysaccharide antigen structures, characterized by, The construction steps are as follows: S6-1 Constructs an antigen typing browsing module, which allows users to select or search for serotype names and view polysaccharide antigen structural information and corresponding strain information; S6-2 Construct an antigen typing retrieval module. Users can choose to input the name of a published strain or the genome sequence number of a strain to search for the corresponding E. coli polysaccharide antigen typing and structural information. Alternatively, users can choose to input the antigen typing and receive the corresponding structural information. S6-3 Constructs an E. coli antigen polysaccharide synthesis information retrieval module, which allows users to retrieve polysaccharide antigen synthesis information by inputting typing, key gene names, glycosidic bonds, glycosyl donors, glycosyl acceptors, and glycosidic bond types; S6-4 Constructs a search module for the monosaccharide synthesis pathway in Escherichia coli, which allows users to input keywords and retrieve corresponding synthesis step information, antigen typing, and strain name. S6-5 Constructs an E. coli data upload and analysis module, based on the eco-TypeDb and eco-GTdb databases and a self-written workflow, to perform typing and functional annotation analysis on user-submitted E. coli genome sequences; The module construction in steps S6-1, S6-2, S6-3 and S6-4 is based on the EcoSP-Db database of Escherichia coli polysaccharide antigen structures in claim 1.
3. The online platform of claim 2, wherein, Users can directly browse or search for the structural information of the O and K units in various E. coli subtypes through a browser. They can also search for polysaccharide synthesis information and monosaccharide synthesis pathways of E. coli O and K antigens.
4. The online platform of claim 2, wherein, Users submit E. coli genomes, and E. coli antigen typing results are returned within minutes.