Unlock instant, AI-driven research and patent intelligence for your innovation.
Chinese abbreviation processing method and device therefor
What is Al technical title?
Al technical title is built by PatSnap Al team. It summarizes the technical point description of the patent document.
A processing method and abbreviation technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as unreal environment, manual intervention, small scale, etc.
Inactive Publication Date: 2012-02-29
TSINGHUA UNIV +1
View PDF0 Cites 0 Cited by
Summary
Abstract
Description
Claims
Application Information
AI Technical Summary
This helps you quickly interpret patents by identifying the three key elements:
Problems solved by technology
Method used
Benefits of technology
Problems solved by technology
[0005] After analyzing the prior art, the inventor found that the prior art has at least the following disadvantages: most of the corpora used in the prior art when identifying Chinese abbreviations are unreal environments, small in scale, poor in timeliness, and some Manual intervention is required, and the accuracy of the experimental results is low
Method used
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more
Image
Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
Click on the blue label to locate the original text in one second.
Reading with bidirectional positioning of images and text.
Smart Image
Examples
Experimental program
Comparison scheme
Effect test
Embodiment 1
[0033] The embodiment of the present invention provides a kind of Chinese abbreviation processing method, such as figure 1 shown, including:
[0034] 110: Perform preprocessing on all query words in the user query log.
[0035] Remove noisy query words in user query logs. The noisy query words here mainly refer to query words containing foreign characters and garbled characters. The above preprocessing also includes filtering numbers, full-width letters, punctuation marks, spaces, etc. in the query words.
[0036] 120: Gather the query words pointing to the same directory of the same website in the preprocessed query log into one group to obtain multiple groups; for the query words in each group, execute steps 130, 140, and 150.
[0037] After the preprocessing in the previous step, most of the reserved query terms are normal Chinese query terms. In the query log, a record generally includes the following content: a query word, a URL (Uniform Resource Locator, Uniform Reso...
Embodiment 2
[0085] The embodiment of the present invention provides a Chinese abbreviation processing device, such as Figure 4 shown, including:
[0086] The preprocessing module 401 is configured to preprocess all query words in the user query log.
[0087] Remove noisy query words in user query logs. The noisy query words here mainly refer to query words containing foreign characters and garbled characters. The above preprocessing also includes filtering numbers, full-width letters, punctuation marks, spaces, etc. in the query words.
[0088] The related word aggregation module 402 is configured to group the query words pointing to the same directory of the same website in the preprocessed query log into one group to obtain multiple groups.
[0089] After being preprocessed by the preprocessing module 401, most of the reserved query terms are normal Chinese query terms. In the query log, a record generally includes the following content: a query word, a URL (Uniform Resource Locato...
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
PUM
Login to View More
Abstract
The invention discloses a Chinese abbreviation processing method and a device therefor, belonging to the text information processing field. The method comprises: pre-treating all the query terms in a log queried by a user; aggregating the query terms referring to the same catalogue of the same website in the pre-treated query log into one group to obtain plural groups; executing the query terms in each group; generating plural candidate pairs matching the source phase and the abbreviation in the group according to the word alignment rule; filtering out the place names in the source phase if the source phase contains place names and the abbreviation has no morpheme corresponding to the place name; and screening the result of filtration in the group according to a preset rule to obtain a collection of the pairs of source phase and abbreviation in the group. The device comprises a pre-treating module, a candidate pair generating module, a filtering module and a screening module. The invention utilizes a user query log to exploring Chinese abbreviations, improving timeliness and accuracy of the pairs of source phases and abbreviations.
Description
technical field [0001] The invention relates to the field of text information processing, in particular to a method and device for processing Chinese abbreviations. Background technique [0002] Abbreviations refer to words formed by condensing, omitting or summarizing fixed expressions in a language. The economic principle of natural language has led to the emergence of abbreviations. By abbreviating words, it can play a role in refining expressions, such as "Peking University" for short. Abbreviations are very common in natural language and account for a large proportion of new words. [0003] Due to the extensive use of abbreviations, it has formed the main source of unregistered new words in natural languageprocessing, resulting in the machine processing Chinese information in terms of word segmentation, part-of-speech tagging, word meaning determination and ambiguityelimination, named entity recognition and entity coreference resolution. There are serious obstacles ...
Claims
the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More
Application Information
Patent Timeline
Application Date:The date an application was filed.
Publication Date:The date a patent or application was officially published.
First Publication Date:The earliest publication date of a patent with the same application number.
Issue Date:Publication date of the patent grant document.
PCT Entry Date:The Entry date of PCT National Phase.
Estimated Expiry Date:The statutory expiry date of a patent right according to the Patent Law, and it is the longest term of protection that the patent right can achieve without the termination of the patent right due to other reasons(Term extension factor has been taken into account ).
Invalid Date:Actual expiry date is based on effective date or publication date of legal transaction data of invalid patent.