Frequency dictionary building method, word segmentation method, server and client device

A frequency and dictionary technology, applied in the Internet field, can solve the problems of no solution, cumbersome, and low word segmentation accuracy, and achieve the effect of simple and efficient word segmentation

Pending Publication Date: 2019-03-01
ALIBABA GRP HLDG LTD
View PDF12 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

The word segmentation method based on statistical machine learning requires a large amount of manually labeled corpus, which is cumbersome to implement, and the staff needs to spend a lot of time and energy, and the word segmentation method based on statistical machine learning needs to be segmented. The text and the field of training corpus are related , otherwise word segmentation accuracy is very low
[0005] For the above problems, no effective solutions have been proposed so far

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Frequency dictionary building method, word segmentation method, server and client device
  • Frequency dictionary building method, word segmentation method, server and client device
  • Frequency dictionary building method, word segmentation method, server and client device

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0048] In order to enable those skilled in the art to better understand the technical solutions in this application, the following will clearly and completely describe the technical solutions in the embodiments of this application with reference to the drawings in the embodiments of this application. Obviously, the described The embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work should fall within the protection scope of this application.

[0049] Considering that in the field of e-commerce, many user behavior records are generated on e-commerce platforms every day. For example, after a user enters a search term, a series of products will be generated as search results. If the user clicks on a certain product or certain products in the search result to collect, purchase, add to the shoppin...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The invention provides a frequency dictionary building method, a word segmentation method, a server and a client device, wherein, the frequency dictionary building method comprises the following steps: obtaining search behavior data, wherein, the search behavior data comprises a plurality of search terms and an object name clicked by a user in a data object returned based on each search term. Thecommon character strings of each search term and the corresponding clicked object name and the frequency of each common character strings are counted. Frequency dictionaries are generated from the counted frequencies of a plurality of common strings, wherein the frequency dictionaries are used for word segmentation of the text to be segmented. The technical proposal provided by the embodiment of the application solves the technical problems that the existing word segmentation mode can not effectively divide new words and the manual labeling cost is too high, and achieves the technical effect of simply and efficiently realizing word segmentation.

Description

Technical field [0001] This application belongs to the field of Internet technology, and in particular relates to a frequency dictionary building method, word segmentation method, server and client device. Background technique [0002] With the rapid development of e-commerce, people are increasingly shopping through shopping websites. Word segmentation is often required when the shopping websites are classified into categories or target objects are matched. For example: the segment "domestic counter high-end chiffon skirt" to be segmented is divided into: domestic counter / high-end / chiffon skirt by word segmentation method. [0003] At present, the commonly used word segmentation methods mainly include: word segmentation methods based on dictionary matching and word segmentation methods based on statistical machine learning. [0004] Among them, the word segmentation method based on dictionary matching needs to rely heavily on the word segmentation dictionary. Therefore, if a new w...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
CPCG06F40/247G06F40/289
Inventor 马春平李林琳谢朋峻徐光伟郎君司罗
Owner ALIBABA GRP HLDG LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products