Efficient short text similarity determination method and device
A technology for determining devices and methods, which is applied in character and pattern recognition, instruments, calculations, etc., and can solve problems such as bumps in text vectors, large similarity vibrations, and high idf values for low-frequency words.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment Construction
[0018] Term Frequency-Inverse Document Frequency (TF-IDF) technology is a commonly used weighting technology for data retrieval and text mining, which can be used to evaluate the impact of a single word on a text in a text database or corpus. Importance. The importance of a word increases in direct proportion to the number of times it appears in the document, that is, term frequency (TF), but at the same time it decreases inversely proportional to the frequency (IDF) it appears in the corpus. If a word is relatively rare, but it appears many times in this article, then it probably reflects the characteristics of this article and is the desired keyword.
[0019] In order to count the keywords of the text, the text can be segmented first, and then the word frequency of each word can be counted. Word frequency refers to the number of times a given word appears in the text. The keywords of the text appear more frequently in the text. However, high-frequency words with no signif...
PUM
Login to View More Abstract
Description
Claims
Application Information
Login to View More 


