Fast fuzzy pinyin inquiry method of mass Chinese file names

A query method and file name technology, applied in special data processing applications, instruments, electrical digital data processing, etc., can solve the problems of slow query speed, simple file name index, poor performance accuracy, etc., to achieve fast query speed and speed up The effect on query performance

Active Publication Date: 2011-11-09
ZHEJIANG UNIV
View PDF5 Cites 12 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

For cloud computing storage centers or enterprise-level file storage servers, the query speed will be slower
[0007]2. The file name index is too simple
But these index libraries simply save all file names without any preprocessing of file names
[0008]3. Poor support for Chinese fuzzy pinyin query
None of the existing well-known file query tools support Chinese fuzzy pinyin query. Although some information retrieval systems have fuzzy pinyin matching functions, they are based on approximate string matching methods based on distance vectors.
For the fuzzy pinyin matching problem, the approximate string matching method is not as good as the factor-based multi-pattern string matching method in terms of performance and accuracy

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Fast fuzzy pinyin inquiry method of mass Chinese file names
  • Fast fuzzy pinyin inquiry method of mass Chinese file names
  • Fast fuzzy pinyin inquiry method of mass Chinese file names

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0030] The present invention will be further described below in conjunction with the accompanying drawings and specific embodiments.

[0031] 1. First develop a file name query system. The core part of the system includes: a file name database index building module, a Chinese character fuzzy pinyin processing module, and a string fast matching module. The Chinese character library (thesaurus) builds a correspondence table from pinyin to words The build of is done at development time to reduce system deployment and runtime overhead. In the corresponding relationship table, the relationship weight is determined according to the word frequency.

[0032] 2. When the system is installed on the client computer, the user is required to input their own fuzzy pinyin rules. Scanning of the file system is done at the same time as the installation to build a filename database. This scanning step can also be done on the first boot of the system after installation. When scanning the file...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a fast fuzzy pinyin inquiry method of mass Chinese file names. The method comprises the following steps of: 1) determining whether an inquiry character string is Chinese pinyin, and if so, performing conversion and extension on the inquiry character string according to a fuzzy pinyin rule so as to form a new inquiry character string, otherwise, keeping the inquiry character string unchanged; 2) performing SetBackwardOracleMatching algorithm on the inquiry character string to build an oracle finite automaton of pattern string identification; 3) traversing a file name database and pre-filtering file names stored in the file name database; and 4) performing SBOM algorithm matching on the file names which are pre-filtered in the step 3) in the file name database, and ranking all the inquiry results satisfying the conditions and returning the inquiry results. The method has the advantages of high inquiry speed, support of fast Chinese inquiry, support of fuzzy-pinyin accurate inquiry and the like in mass files.

Description

Technical field [0001] The invention involves the field of computer system application technology, which specifically involves a method that supports all files in the computer to quickly query all files in the computer. Background technique [0002] As the capacity of the storage system continues to increase, more and more files in computer systems.Nowadays, ordinary personal laptops will have more than 100G storage, which keeps more than one million documents.How to quickly find files that meet the query requirements from these massive documents become an increasingly important issue. [0003] Massive information query usually uses full -text retrieval methods.However, the full text retrieval is not suitable for the file name, and it is even more inappropriate for the Chinese file name.The necessary condition for the full text to retrieve is a suitable word division system, and then the original document is indexed by the original document.However, due to historical habits / proce...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 袁新宇李莹
Owner ZHEJIANG UNIV
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products