Unsupervised automatic extraction method of microblog new words based on repeated word strings

A technology for automatic extraction and repetition of words, which is applied in the fields of electrical digital data processing, natural language data processing, and special data processing applications. Guaranteed extraction speed and high accuracy
CN103678656AInactive Publication Date: 2014-03-26HEFEI UNIV OF TECH

Patent Information

Authority / Receiving Office
CN · China
Patent Type
Applications(China)
Current Assignee / Owner
HEFEI UNIV OF TECH
Publication Date
2014-03-26
Estimated Expiration
Not applicable · inactive patent

Smart Images

  • Figure 1
    Figure 1
  • Figure 2
    Figure 2
  • Figure 3
    Figure 3
Patent Text Reader

Abstract

The invention discloses an unsuspervised automatic extraction method of microblog new words based on repeated word strings. The method includes the steps that firstly, text segmentation is conducted on microblog documents to be processed, texts are segmented through a dynamic programming word segmentation method, the word strings to be recognized are segmented, and word segmentation fragments in the word strings to be recognized are combined into the new words to be recognized; candidate new words are extracted from the word strings to be recognized according to a statistic word selection model, and then the candidate words are filtered through a rule filtering model, and eventually the final new words are acquired. The method has the advantages that the high accuracy rate is effectively guaranteed, the method does not depend on a rule word stock too much, and the extraction speed of the new words is guaranteed.
Need to check novelty before this filing date? Find Prior Art

Description

technical field

[0001] The invention belongs to the technical field of new word retrieval methods, and relates to a non-supervised automatic extraction method of microblog new words based on repeated word strings. Background technique

[0002] New word recognition is one of the main problems plaguing the field of Chinese word segmentation, and with the development of Weibo, the speed of the emergence of new words has been accelerated. Unsupervised automatic recognition of new words is crucial for other natural language processing tasks. Automatic segmentation of Chinese text is an important basic work in the field of natural language processing. The identification and processing of new words is one of the difficulties that restrict the further improvement of the accuracy of the Chinese word segmentation system. At present, the research on new word extraction mainly focuses on the extraction of entity nouns, especially the extraction of names of people, places, and institut...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More