Unlock instant, AI-driven research and patent intelligence for your innovation.

Chinese abbreviation processing method and device therefor

A processing method and abbreviation technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as unreal environment, manual intervention, small scale, etc.

Inactive Publication Date: 2012-02-29
TSINGHUA UNIV +1
View PDF0 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] After analyzing the prior art, the inventor found that the prior art has at least the following disadvantages: most of the corpora used in the prior art when identifying Chinese abbreviations are unreal environments, small in scale, poor in timeliness, and some Manual intervention is required, and the accuracy of the experimental results is low

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Chinese abbreviation processing method and device therefor
  • Chinese abbreviation processing method and device therefor
  • Chinese abbreviation processing method and device therefor

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0033] The embodiment of the present invention provides a kind of Chinese abbreviation processing method, such as figure 1 shown, including:

[0034] 110: Perform preprocessing on all query words in the user query log.

[0035] Remove noisy query words in user query logs. The noisy query words here mainly refer to query words containing foreign characters and garbled characters. The above preprocessing also includes filtering numbers, full-width letters, punctuation marks, spaces, etc. in the query words.

[0036] 120: Gather the query words pointing to the same directory of the same website in the preprocessed query log into one group to obtain multiple groups; for the query words in each group, execute steps 130, 140, and 150.

[0037] After the preprocessing in the previous step, most of the reserved query terms are normal Chinese query terms. In the query log, a record generally includes the following content: a query word, a URL (Uniform Resource Locator, Uniform Reso...

Embodiment 2

[0085] The embodiment of the present invention provides a Chinese abbreviation processing device, such as Figure 4 shown, including:

[0086] The preprocessing module 401 is configured to preprocess all query words in the user query log.

[0087] Remove noisy query words in user query logs. The noisy query words here mainly refer to query words containing foreign characters and garbled characters. The above preprocessing also includes filtering numbers, full-width letters, punctuation marks, spaces, etc. in the query words.

[0088] The related word aggregation module 402 is configured to group the query words pointing to the same directory of the same website in the preprocessed query log into one group to obtain multiple groups.

[0089] After being preprocessed by the preprocessing module 401, most of the reserved query terms are normal Chinese query terms. In the query log, a record generally includes the following content: a query word, a URL (Uniform Resource Locato...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a Chinese abbreviation processing method and a device therefor, belonging to the text information processing field. The method comprises: pre-treating all the query terms in a log queried by a user; aggregating the query terms referring to the same catalogue of the same website in the pre-treated query log into one group to obtain plural groups; executing the query terms in each group; generating plural candidate pairs matching the source phase and the abbreviation in the group according to the word alignment rule; filtering out the place names in the source phase if the source phase contains place names and the abbreviation has no morpheme corresponding to the place name; and screening the result of filtration in the group according to a preset rule to obtain a collection of the pairs of source phase and abbreviation in the group. The device comprises a pre-treating module, a candidate pair generating module, a filtering module and a screening module. The invention utilizes a user query log to exploring Chinese abbreviations, improving timeliness and accuracy of the pairs of source phases and abbreviations.

Description

technical field [0001] The invention relates to the field of text information processing, in particular to a method and device for processing Chinese abbreviations. Background technique [0002] Abbreviations refer to words formed by condensing, omitting or summarizing fixed expressions in a language. The economic principle of natural language has led to the emergence of abbreviations. By abbreviating words, it can play a role in refining expressions, such as "Peking University" for short. Abbreviations are very common in natural language and account for a large proportion of new words. [0003] Due to the extensive use of abbreviations, it has formed the main source of unregistered new words in natural language processing, resulting in the machine processing Chinese information in terms of word segmentation, part-of-speech tagging, word meaning determination and ambiguity elimination, named entity recognition and entity coreference resolution. There are serious obstacles ...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/27G06F17/30
Inventor 谢丽星孙茂松佟子健王灿辉
Owner TSINGHUA UNIV