Method for processing converting abnormal word containing unicode four byte code East Asia ideograph in searching engine

A search engine and ideographic technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve the problem that relevant information cannot be retrieved, and achieve the effect of simple-to-traditional conversion

Inactive Publication Date: 2006-06-14
王绯
View PDF0 Cites 7 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

That is to say, in the current search engine, if only simplified or traditional characters are used, relevant information in other East Asian countries or ancient documents cannot be retrieved.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method for processing converting abnormal word containing unicode four byte code East Asia ideograph in searching engine
  • Method for processing converting abnormal word containing unicode four byte code East Asia ideograph in searching engine

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0018] The main purpose of the present invention is to provide a method for processing the conversion of East Asian ideographic characters containing Unicode four-byte encoding in a search engine. According to the list of variant Chinese characters, the method adopts the idea of ​​layered matching, and realizes the matching search among various East Asian Chinese character fonts, between current commonly used characters and ancient Chinese characters, and between different versions of ancient Chinese characters in the search engine.

[0019] The specific implementation method is as follows

[0020] A. Divide the list of variant characters into two types of sub-tables according to common and ancient characters, and store them separately. For example, "Wei" (Simplified Chinese), "Wei" (Taiwan Traditional), "Asia" (Simplified Chinese), "亜" (Japanese), "Asia" (Taiwan Traditional), etc. are commonly used in various regions of East Asia. word list; The characters used in larg...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a method for processing the conversion of variant forms of Unicode four-byte code-containing East-Asia expression ideographs in search engines. According to a table of variant forms of Chinese characters, the method adopts the idea of layered matching and realizes the matched search of variant forms of characters between various East-Asia Chinese characters, between the current frequently used characters and ancient writings and between ancient writings of different versions in the search engines. As searching, as long as any one of the variant forms of characters is inputted, the information containing other variant forms of characters will be searched. The invention makes the search engines able to more accurately search the user-needed information without considering the conversion problem between various variant forms of characters.

Description

technical field [0001] The invention relates to a method for processing variant character conversion of East Asian ideographic characters containing Unicode four-byte codes in a search engine. Background technique [0002] Search engines can help users find the useful information they need in massive amounts of information. With the continuous advancement of informatization, human beings have accumulated more and more information data, especially on the Internet, and the accumulated information is increasing exponentially every year. Search engines play a key role in finding the information users need in the vast Internet information. Due to the five thousand years of Chinese cultural accumulation and the uniqueness of the Chinese language, foreign English search engines cannot handle Chinese search engines very well. Therefore, there have also been Chinese search engines that specifically deal with Chinese, such as Baidu. Baidu search engine uses a unique Chinese languag...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
Inventor 冯建康王宏源赵锋
Owner 王绯
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products