A method for filtering SNP (
Single Nucleotide Polymorphism) unrelated to complex diseases from a whole-
genome is used for the pathogenetic mechanism research of complex diseases, the early diagnosis and biological
medicine development. The method comprises the following steps: (1), pre-
processing and initializing
SNP data, and
processing the
SNP data into data only including 0, 1, 2, 3 as per the principle that the influence of the variation of a random
gene among the alleles of homologous chromosomes on diseases can be in equal treatment; (2), defining the
relevance measure, namely defining the relevance I (Y;X) between the SNP subset X and the diseases Y as
mutual information MI (Y;X) between X and Y; (3), searching SNP groups of the candidate suspected
pathogenesis in the SNP set by adopting an FGSA (factor based genetic
search algorithm) method; (4), selecting the SNP group of which the occurrence frequency of frequentness exceeds a threshold value in the set of SNP groups of the candidate suspected
pathogenesis according to the frequentness-relevance priority criterion; (5), outputting the SNP of which the frequentness is larger than the threshold value and
ranking at the headmost. According to the invention, the method can reserve the SNP corresponding to the
pathogenesis covered by other pathogenesis, so as to lay a foundation for the discovery of the follow-up pathogenesis.