Eureka AIR delivers breakthrough ideas for toughest innovation challenges, trusted by R&D personnel around the world.

New word automatic searching system and new word automatic searching method based on query log

A technology for automatic search and new words, applied in special data processing applications, instruments, electronic digital data processing, etc., can solve problems such as difficult expansion of rules and methods, and difficulty in obtaining corpus

Inactive Publication Date: 2012-12-19
人民搜索网络股份公司
View PDF2 Cites 24 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0006] In view of this, the main purpose of the present invention is to provide a new word automatic search system and method based on the query log, to solve the problems such as the difficult acquisition of the corpus of the existing statistical method and the difficult expansion of the rule method, by using the word string co-occurrence rate, supplemented by filtering strategies, the new word automatic search system and method do not need to build a corpus and special rules, and can automatically find new words from the query log easily and easily

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • New word automatic searching system and new word automatic searching method based on query log
  • New word automatic searching system and new word automatic searching method based on query log
  • New word automatic searching system and new word automatic searching method based on query log

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The method of the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments of the present invention.

[0031] The basic thought of the present invention is: for the problems such as difficult acquisition of the corpus of statistical method and difficult extension of rule method, propose the co-occurrence rate of using word string of the present invention, supplemented with filtering strategy based on query log automatic discovery system and method for new words . Its purpose is to realize the automatic discovery of new words from the query log without constructing a corpus and special rules. The invention also realizes incremental new word discovery. Once a new word is found, it can be immediately added to the word segmentation lexicon to ensure that the new word is not repeatedly found. It is suitable for various applications such as word segmentation dictionary expansion and hot word mining.

[0032] N...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a new word automatic searching system and a new word automatic searching method based on a query log. The new work automatic searching system mainly comprises a query log preprocessing module, a new word discovering module and a new word generating module, wherein the query log preprocessing module is used for regularly acquiring a query string, query frequency and the like from the query log in a timed interval according to set timed new word discovering time; the new word discovering module counts the frequency of the same n-gram strings according to a word segmentation result of the query string, computes the concurrence rate of the n-gram strings, and mergers primary and secondary strings with the similar frequency in a candidate new word assembly; and the new word generating module performs filtering and pruning strategies for the candidate new word assembly, and removes rubbish strings in the candidate new word assembly, so that a final new word assembly is obtained. By the new word automatic searching system and the new word automatic searching method, problems that a corpus is difficult to acquire by an existing statistic method, a rule-based method is difficult to extend and the like are solved, new words can be automatically discovered from the query log simply and easily by the aid of the concurrence rate of the word strings and the auxiliary filtering strategy, and the final new word assembly is introduced into a word segmentation bank, so that incremental new word discovery is realized.

Description

technical field [0001] The invention relates to the field of Internet information processing, in particular to a system and method for automatically finding new words based on query logs. Background technique [0002] With the rapid development of the Internet, the release and dissemination of network information is getting faster and faster, and new words on the Internet are emerging one after another. According to statistics made by experts from the Chinese Language and Characters Work Committee, an average of more than 800 new words have been produced each year in the past 20 years since the reform and opening up. In recent years, the development of the Internet has already made the speed of new words far exceed this number. The emergence of new words has greatly reduced the ability to segment, understand, and retrieve information when processing Internet information. Therefore, how to effectively discover new words is an important task in the field of Internet informa...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 张爱琦崔世起杨青
Owner 人民搜索网络股份公司
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Eureka Blog
Learn More
PatSnap group products