Universal special word recognition method and system based on mode expansion

A recognition method and recognition system technology, applied in the field of special new word extraction, can solve the problems of low matching coverage, difficult expansion of matching objects, low efficiency of new words, etc., and achieve the effect of high accuracy

Active Publication Date: 2020-05-15
NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT +1
View PDF5 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0018] The content of the present invention is to solve the problems of low efficiency in extracting new words from a

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Universal special word recognition method and system based on mode expansion
  • Universal special word recognition method and system based on mode expansion
  • Universal special word recognition method and system based on mode expansion

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0056] Aiming at the problems that the above method has low efficiency in finding and extracting new words, difficult to expand the matching mode, prone to false matching, low matching coverage, and inaccurate extraction results, the present invention designs a method based on Chinese phonetic shape coding, Chinese character syllables and Chinese character structure. The advantage of the multi-pattern prefix subtree matching method is that fuzzy matching is used to expand the coverage of new words, and only a small number of original words (the word on which the change pattern is based, such as "Tian*men", "Tian'anmen", " "tiananmen" and "Tian*men", "Tian*men", etc., which are mixed with multiple modes, are based on the basic word "Tiananmen", which is a mixture of one or more modes) as a template to build a dictionary tree and reduce the number of dictionaries. Tree construction overhead, users can easily expand new search extraction objects.

[0057] The technical key points...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a universal special word recognition method and a system based on mode expansion, and provides the method and the system for completing new word extraction by constructing a prefix tree based on phonetic form codes, common Chinese character syllables, common Chinese character structures and special character mapping nodes of basic words and performing fuzzy matching by comparing character code similarities. The method can be applied to scenes such as discovery and extraction of specific words in a large number of texts, extraction and generation of data sets of some tasks and preprocessing of a given text data set, such as text preprocessing processes such as screening and correction of data sets such as short messages and microblogs. Data sources and basic annotations are provided for the next text classification task, and help is provided for discovery and correction of new words in text data.

Description

technical field [0001] The invention relates to the field of special new word extraction, in particular to the technical field of finding and extracting special new words by constructing a prefix tree by using Chinese character phonetic code, Chinese character syllable and Chinese character structure to perform fuzzy matching. Background technique [0002] With the rapid development of network culture and the explosion of information knowledge, users have created a large number of words with new meanings and new expressions. These new words usually have the following characteristics: [0003] 1) Sound like replacement. Replace one or more words in the original word with words with similar pronunciation to form new words. Be able to express the meaning of the original word through the pronunciation of the new word. [0004] 2) Similar to replacement. Substitute one or more words in the original word with words of similar structure to form new words. Able to express the me...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
IPC IPC(8): G06F40/186G06F40/126G06F40/284G06F16/33G06F16/31
CPCG06F16/316G06F16/3331
Inventor 段东圣任博雅孙旷怡井雅琪时磊佟玲玲李扬曦宋永浩卢杰
Owner NAT COMP NETWORK & INFORMATION SECURITY MANAGEMENT CENT
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products