Method and device for matching texts

一种匹配方法、文本的技术,应用在数据处理领域,能够解决影响系统性能、数据处理量大、处理速度慢等问题,达到提高系统性能、匹配过程实现简单、强通用性和普遍适用性的效果

Inactive Publication Date: 2013-09-18
ALIBABA CLOUD COMPUTING LTD
View PDF4 Cites 0 Cited by
  • Summary
  • Abstract
  • Description
  • Claims
  • Application Information

AI Technical Summary

Problems solved by technology

[0009] The embodiment of the present application provides a text matching method and device, which are used to solve the problems in the prior art that the large amount of text matching data processing results in slow processing speed, affects system performance, and causes transmission congestion, etc.

Method used

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
View more

Image

Smart Image Click on the blue labels to locate them in the text.
Viewing Examples
Smart Image
  • Method and device for matching texts
  • Method and device for matching texts
  • Method and device for matching texts

Examples

Experimental program
Comparison scheme
Effect test

Embodiment 1

[0038] In the text matching method provided in Embodiment 1 of the present application, for each new text in each period, the similarity between each new text and each original text, and between any two new texts is calculated. That is, the similarity data related to the newly added text is determined. For example: when used in the product recommendation process, the new text is obtained based on the product information released in the current cycle. And determine all commodities matching the commodity information released in the current cycle according to the newly added text (the information includes the commodity information released before and the commodity information released in the current cycle).

[0039] The flow of the text matching method provided in Embodiment 1 of the present application is as follows figure 2 As shown, the execution steps are as follows:

[0040] Step S11: Periodically collect content information released by users, and obtain new texts in the ...

Embodiment 2

[0079] The text matching method provided in Embodiment 2 of the present application calculates the similarity between any two texts for each text stored in the data after the newly added text is input in each cycle, and its process is as follows image 3 As shown, the execution steps are as follows:

[0080] Step S21: Periodically collect content information released by users, and obtain new texts in the current cycle according to the content information released by users.

[0081] It is the same as step S11 and will not be repeated here.

[0082] Step S22: Segment the newly added text to extract keywords.

[0083] It is the same as step S12 and will not be repeated here.

[0084] Step S23: Calculate the weight of each keyword extracted from the newly added text in each text currently stored in the database according to the pre-stored word frequency table.

[0085] The same as step S13, which will not be repeated here.

[0086] Step S24: Calculate the similarity between an...

Embodiment 3

[0095] The text matching method provided in Embodiment 3 of the present application improves on the solutions of Embodiment 1 and Embodiment 2, and adds an output filtering process. Specifically include:

[0096] After step S14 of embodiment one calculates similarity and before step S15 determines relevant text, increase the step of output filtering, after step S24 of embodiment two calculates similarity and before step S25 determines relevant text, increase the process of output filtering, its flow process like Figure 4 As shown, the execution steps are as follows:

[0097] Step S31: Obtain the calculated similarity between each newly added text and each text currently stored in the database, or the calculated similarity between any two texts in the database.

[0098] For the filtering of the similarity of two texts, the similarity of different texts can be filtered according to the different requirements determined by the subsequent related texts. Therefore, for the first...

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

PUM

No PUM Login to view more

Abstract

Matching text sets is disclosed, including: extracting a text set from data associated with a current period; storing the text set with a plurality of text sets; extracting a keyword from the text set; determining a weight value associated with the keyword associated with the text set; determining a degree of similarity between the text set and another text set based at least in part on a weight value associated with the keyword associated with the text set and a weight value associated with a keyword associated with the other text set; and determining whether the text set is related to the other text set based at least in part on the determined degree of similarity.

Description

technical field [0001] This application relates to the field of data processing, in particular to a text matching method and device with a large amount of data. Background technique [0002] Existing text comparison generally adopts the method of full calculation and matching. When it is necessary to calculate the degree of correlation between texts, it is necessary to calculate all the acquired texts, and finally obtain the similarity between two pairs. In this way, each calculation of similarity The degree of calculation must be calculated for all text data, and the amount of calculation will be very huge, and its running time is on the order of O(N^2). As the number of texts N increases, the calculation time will also be very long . [0003] This large amount of data calculation comparison has a great impact on the system performance of the equipment, which puts great pressure on the system's I / O communication, data storage, and data network transmission, resulting in sl...

Claims

the structure of the environmentally friendly knitted fabric provided by the present invention; figure 2 Flow chart of the yarn wrapping machine for environmentally friendly knitted fabrics and storage devices; image 3 Is the parameter map of the yarn covering machine
Login to view more

Application Information

Patent Timeline
no application Login to view more
Patent Type & Authority Patents(China)
IPC IPC(8): G06F17/30G06F17/27
CPCG06F17/3069G06F16/3347
Inventor 张旭苏宁军顾海杰祁建程
Owner ALIBABA CLOUD COMPUTING LTD
Who we serve
  • R&D Engineer
  • R&D Manager
  • IP Professional
Why Eureka
  • Industry Leading Data Capabilities
  • Powerful AI technology
  • Patent DNA Extraction
Social media
Try Eureka
PatSnap group products