Looking for breakthrough ideas for innovation challenges? Try Patsnap Eureka!

A crowdsourcing construction method of Chinese-Mongolian corpus based on WeChat public platform

A WeChat public platform and construction method technology, applied in the field of corpus resource construction, can solve the problems of large investment and high cost of spoken language, and achieve the effects of simple interaction, mitigation of adverse effects, and high user participation.

Active Publication Date: 2022-02-08
BEIJING INSTITUTE OF TECHNOLOGYGY
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0003] The purpose of the present invention is to overcome the above-mentioned existing Chinese-Mongolian corpus crowdsourcing construction method that has the technical defects of high cost and large investment in the collection of spoken language in real scenes, and provides a Chinese-Mongolian corpus crowdsourcing construction method based on the WeChat public platform

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • A crowdsourcing construction method of Chinese-Mongolian corpus based on WeChat public platform
  • A crowdsourcing construction method of Chinese-Mongolian corpus based on WeChat public platform
  • A crowdsourcing construction method of Chinese-Mongolian corpus based on WeChat public platform

Examples

Experimental program
Comparison scheme
Effect test

Embodiment Construction

[0034] The method in the embodiment of the present invention will be described in detail and completely below in conjunction with the accompanying drawings.

[0035] The Chinese-Mongolian corpus crowdsourcing construction method based on the WeChat public platform used in this embodiment, such as figure 1 shown. The specific steps are:

[0036] The corpus in the embodiment of the present invention includes text corpus and speech corpus, including Chinese-Mongolian bilingual aligned corpus and monolingual text-aligned corpus in the fields of machine translation and natural language processing.

[0037] Step A. According to definition 1 and definition 2, the original text is preprocessed, wherein, the specific process of preprocessing the original corpus varies with the translation direction, and the purpose is to standardize the corpus, and this embodiment does not Limit the source of the corpus, segment it, and delete meaningless data. Table 1 is an example of the original ...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

PUM

No PUM Login to View More

Abstract

The invention discloses a crowdsourcing construction method of a Chinese-Mongolian corpus based on the WeChat public platform, which belongs to the field of corpus resource construction. The specific operation steps include: 1) Obtain the original corpus of multi-genre open domain; 2) Screen and filter the users who participated in the translation task through the Mongolian proficiency test questionnaire; Crowdsourcing translation tasks; 4) each WeChat client translates one or more source sentences into Mongolian and feeds them back in the form of speech; 5) evaluates the corpus through a combination of background administrator review and crowdsourcing quality assessment quality, realize the quality control of the corpus; the Chinese-Mongolian corpus crowdsourcing construction method based on the WeChat public platform completes the corpus collection online, the interaction is simple, the user experience is good, the user participation is high, and it effectively solves the problem in the real Mongolian language environment. The problem of collecting open domain natural spoken language corpus has shown a very high practical prospect under the Internet mobile platform.

Description

technical field [0001] The invention relates to a crowdsourcing construction method of a Chinese-Mongolian corpus based on a WeChat public platform, and belongs to the technical field of corpus resource construction. Background technique [0002] Due to the single type and small size of the Mongolian corpus, the exploration of the construction of the Han-Mongolian spoken language corpus has gradually become an important research content in the field of natural language research, especially the research on resource construction methods will affect the related research on large-scale corpus resources. On the other hand, due to the problem of non-uniform text encoding caused by the complexity of the Mongolian language itself, using the phonetic corpus as an entry point can be a feasible way to build corpus resources. However, the current Chinese-Mongolian spoken language corpus is constructed using expert annotation methods that require a lot of manpower and material resources,...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to View More

Application Information

Patent Timeline
no application Login to View More
Patent Type & Authority Patents(China)
IPC IPC(8): G06Q10/10G06Q10/06
CPCG06Q10/101G06Q10/06395G06Q10/06398
Inventor 史树敏苏日海廖乐健黄河燕
Owner BEIJING INSTITUTE OF TECHNOLOGYGY
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Patsnap Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Patsnap Eureka Blog
Learn More
PatSnap group products