The invention discloses a
Chinese word segmentation method based on a Hash
algorithm, and relates to the field of
natural language processing. The method comprises the following steps of S1, configuring a word segmentation device on a
search engine and establishing a dictionary structure; s2, monitoring the return operation of the user, and obtaining the first character in an input box; s3, inputting the first character into a dictionary for primary searching and screening; s4, forming a tree by all words with the same first characters in the dictionary; s5, placing a second word in the word on a second layer of the tree, and creating a Hash index table; s6, carrying out Hash searching on the remaining characters; s7, after an IK reads the new
lexicon, notifying the
search engine to update; and S8, updating the dictionary information in the memory by the
search engine. According to the invention, the Hash search is carried out on the first character by creating a dictionary storage mechanism, the dictionary structure and the
algorithm of carrying out Hash search on the remaining characters via the tree result are established, and the search engine is updated by using IK word segmentation, so that the
Chinese word segmentation efficiency is improved, the
system complexity is reduced, and the index redundancy degree is reduced.