The invention discloses a Chinese
label extraction method for
clustering search results of a
search engine, which comprises the following steps of: S1, inputting
search words by a user to form an input document; S2, selecting candidate words, and scoring all the candidate words; S3, judging whether unmarked candidate words exist, if not existing, skipping to a step S8; if existing, selecting a candidate word with highest
score, expanding the selected candidate word into a set of ordered word sequences containing the word, and entering a step S4; S4, calculating the frequency of each ordered word sequence, and extracting the high-frequency word sequence; S5, scoring the high-frequency word sequence, and selecting a candidate word sequence; S6, judging whether the candidate word sequence is accepted as a
label, if so, entering a step S7, otherwise, returning to the step S3; S7, performing clustering according to the generated
label; and S8, completing the operation. The method can reduce
noise labels, and the labels have better representativeness, simplicity and integrity.