Word segmentation method and device, electronic equipment and storage medium

A word segmentation method and storage medium technology, applied in the direction of electrical digital data processing, special data processing applications, instruments, etc., can solve problems such as high cost and increased labor cost, and achieve reduced investment, reduced time investment, and reduced manual labeling operations Effect

Active Publication Date: 2019-11-22
BEIJING SANKUAI ONLINE TECH CO LTD
View PDF4 Cites 1 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0002] In different search scenarios, proper nouns in different fields are often involved, and it is difficult to use a general tokenizer to segment proper nouns in various fields
[0003] At present, the method of training field word segmentation is usually used to train a dedicated word segmenter for each different field. Using this method, the cost of collecting data and organizing manual annotation is extremely high, and the proper nouns in the field If there is a slight change, the model needs to be retrained to adapt to the new domain terminology, further resulting in an increase in labor costs

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Word segmentation method and device, electronic equipment and storage medium
  • Word segmentation method and device, electronic equipment and storage medium
  • Word segmentation method and device, electronic equipment and storage medium

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0047] refer to figure 1 , which shows a flow chart of the steps of a word segmentation method provided by an embodiment of the present disclosure. The word segmentation method may specifically include the following steps:

[0048] Step 101: Segment the text to be segmented to obtain a plurality of first segmented texts.

[0049] In this disclosed embodiment, the text to be segmented refers to the text input by the user for word segmentation, for example, the user enters "I want to eat chicken Paifan", or "the oldest building in Beijing" entered by the user in the search bar of the travel interface, etc., then the input "I want to eat chicken chop rice", "the oldest building in Beijing", etc. can be used as the text to be segmented .

[0050] The first word segmentation text refers to the multiple word segmentation obtained after the text to be segmented is segmented. It can be understood that in this disclosure, the first word segmentation text is a text composed of a singl...

Embodiment 2

[0084] refer to figure 2 , which shows a flow chart of the steps of a word segmentation method provided by an embodiment of the present disclosure. The word segmentation method may specifically include the following steps:

[0085] Step 201: Input the text to be segmented into a general word segmentation model.

[0086] In the embodiment of the present disclosure, the word-to-be-segmented text refers to the text input by the user for word segmentation, for example, the user enters "nearby Xiangwei" in the search bar of the food interface of the Meituan APP (Application, "home", or "tourist attractions with water sports" entered by the user in the search bar of the travel interface, etc., then the input "nearby Xiangwei homes" and "tourist attractions with water sports" can be used as the waiting list. participle text.

[0087] It can be understood that the above examples are only examples for better understanding the technical solutions of the embodiments of the present dis...

Embodiment 3

[0124] refer to image 3 , which shows a schematic structural diagram of a word segmentation device provided by an embodiment of the present disclosure. The word segmentation device may specifically include the following modules:

[0125] The first word segmentation acquisition module 310 is used to perform word segmentation processing on the word segmentation text to obtain a plurality of first word segmentation texts;

[0126] The input scene acquisition module 320 is used to obtain the input scene corresponding to the text to be segmented;

[0127] A business field determination module 330, configured to determine the business field corresponding to the input scene according to the mapping relationship between the scene and the business field;

[0128] A noun list obtaining module 340, configured to obtain a noun list in the business field matching the business field;

[0129] The business word segmentation generating module 350 is configured to perform first merging proc...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

The embodiment of the invention provides a word segmentation method and device, electronic equipment and a storage medium. The method comprises the steps of performing word segmentation processing ona text to be subjected to word segmentation to obtain a plurality of first word segmentation texts; obtaining an input scene corresponding to the text to be subjected to word segmentation; determininga service field corresponding to the input scene according to a mapping relationship between the scene and the service field; obtaining a business domain noun list matched with the business domain; and according to the business domain noun list, performing first merging processing on at least two adjacent first word segmentation texts in the plurality of first word segmentation texts to generatethe business domain segmented words. According to the embodiment of the invention, the input of labor cost can be reduced, and the time input is reduced.

Description

technical field [0001] Embodiments of the present disclosure relate to the technical field of word segmentation processing in the business field, and in particular, to a word segmentation method, device, electronic equipment, and storage medium. Background technique [0002] In different search scenarios, proper nouns in different fields are often involved, and it is difficult to use a general tokenizer to segment proper nouns in various fields. [0003] At present, the method of training field word segmentation is usually used to train a dedicated word segmenter for each different field. Using this method, the cost of collecting data and organizing manual annotation is extremely high, and the proper nouns in the field If there is a slight change, the model needs to be retrained to adapt to the new domain terminology, further resulting in an increase in labor costs. Contents of the invention [0004] Embodiments of the present disclosure provide a word segmentation method...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Applications(China)
IPC IPC(8): G06F17/27
Inventor 刘凡
Owner BEIJING SANKUAI ONLINE TECH CO LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products