Server and Chinese character segmentation method and device

A Chinese word segmentation and word segmentation technology, which is applied in the field of search engines, can solve problems such as weak new word recognition ability, high error rate, and inability to correct quickly, achieving high recognition, easy operation, and improved accuracy.

Active Publication Date: 2015-03-25
TENCENT TECH (SHENZHEN) CO LTD
View PDF5 Cites 8 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0005] Statistical-based machine learning methods consume a lot of manpower and time, and are very dependent on the results of manual word segmentation. When there are errors in manual word segmentation results, they cannot be corrected quickly; The recognition ability of machine learning methods for new words is very weak, and for word segmentation in proprietary fields, the error rate is also high

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Server and Chinese character segmentation method and device
  • Server and Chinese character segmentation method and device
  • Server and Chinese character segmentation method and device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0042] The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are some of the embodiments of the present invention, but not all of them. Based on the embodiments of the present invention, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the protection scope of the present invention.

[0043] figure 1 It is a flowchart of a Chinese word segmentation method provided by the embodiment of the present invention. The execution subject of the embodiment of the present invention is a server, see figure 1 , the method includes:

[0044] 101: Receive a word segmentation instruction, the word segmentation instruction carries keywords to be segmented;

[0045] The embodiment of the present invention is applied to a scenario where a serv...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention discloses a server and a Chinese character segmentation method and device and belongs to the technical field of search engines. The Chinese character segmentation method includes: receiving a word segmentation instruction; acquiring a first Chinese character set; acquiring retrieving information corresponding to each character in the first character set according to preset corresponding relations; acquiring multiple combined characters and retrieving probabilities according to the first character set and the retrieving information corresponding to each character in the first character set; performing path combination according to the characters included in the multiple combined characters; acquiring the retrieving probability of each path; determining the path with the highest probability; performing segmentation on keywords according to the combined characters included in the path with the highest probability. Manual segmentation is omitted, independence on tools of dictionaries and the like is not needed, convenience in operation is achieved, data sources are dynamically updated, wrong segmentation modes can be rapidly corrected, high distinguishing degree is achieved for new characters, and accuracy in segmentation is improved.

Description

technical field [0001] The invention relates to the technical field of search engines, in particular to a Chinese word segmentation method, device and server. Background technique [0002] With the development of search technology and the improvement of user search needs, users may input a long keyword to be searched when performing data search. At this time, if the keyword is directly searched, the search success rate is generally very low. In order to improve the search success rate, keywords can be segmented before searching, and then the obtained word segmentation results can be searched, and the content matching the word segmentation results can be used as the keyword search results. [0003] Among them, when segmenting Chinese keywords, a machine learning method based on statistics can be used. Specifically, it includes the following steps: (1) collect text sets from data sources such as publicly issued media; (2) manually select part of the text sets for word segment...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/30
CPCG06F16/3335G06F16/3334G06N20/00G06F16/3344G06N7/01
Inventor 马超
Owner TENCENT TECH (SHENZHEN) CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products