Subject-based searching method and device
A search method and topic technology, applied in the computer field, can solve problems such as difficult to rank in the front, unable to be recalled, redundant query expressions, etc.
- Summary
- Abstract
- Description
- Claims
- Application Information
AI Technical Summary
Problems solved by technology
Method used
Image
Examples
Embodiment 1
[0068] figure 1 The main flowchart of the subject-based search method provided by Embodiment 1 of the present invention, such as figure 1 As shown, the method may include the following steps:
[0069] Step 101: Use the topic analysis model to perform topic analysis on the query input by the user to determine the topic distribution corresponding to the query, and use the topic analysis model to perform topic analysis on each document in the document library to determine the topic distribution corresponding to each document.
[0070] The subject analysis model involved in this step is pre-established, including the subject words contained in each subject and the weight of each subject term in the subject to which it belongs. Using the topic analysis model, the topic distribution corresponding to the query and the topic distribution corresponding to each document can be determined. Among them, the establishment process and content of the theme analysis model will be described i...
Embodiment 2
[0077] In the embodiment of the present invention, the theme analysis model may adopt a probability model describing the theme, which may include but not limited to: Probabilistic Latent Semantic Analysis (PLSA) model, Latent Dirichlet Allocation (LDA) and so on.
[0078] LSA is a method that uses mathematical and statistical methods to extract terms in documents, infers the semantic relationship between them, and builds a semantic index, and organizes documents into semantic space structures, that is, those with high semantic relevance Terms map to the same topic. PLSA uses a probability model to describe between documents and latent semantics, latent semantics and terms on the basis of latent semantic indexing of LSA. The so-called latent semantics is the subject referred to in the embodiments of the present invention.
[0079] LDA is an unsupervised machine learning technique used to identify hidden topic information in large-scale document collections or corpora. It uses ...
Embodiment 3
[0091] figure 2 The detailed flow chart of the topic-based search method provided by Embodiment 3 of the present invention, such as figure 2 As shown, the process specifically includes the following steps:
[0092] Step 201: Analyzing the subject terms of each document in the document library to obtain subject term sets of each document.
[0093] The process of analyzing the keywords of the document first divides the document into words, and then selects the keywords based on TF or TF-IDF, that is, selects words that meet the requirements of TF or TF-IDF as the keywords. This method usually performs well, but for some documents with scattered words, the statistical word frequency has no obvious characteristics. In addition, for some cheating documents, the cheater piles up words that have nothing to do with the text topic. If it is purely based on word frequency information , obviously not an accurate reflection of the theme. Therefore, the embodiment of the present inven...
PUM
Abstract
Description
Claims
Application Information
- R&D Engineer
- R&D Manager
- IP Professional
- Industry Leading Data Capabilities
- Powerful AI technology
- Patent DNA Extraction
Browse by: Latest US Patents, China's latest patents, Technical Efficacy Thesaurus, Application Domain, Technology Topic, Popular Technical Reports.
© 2024 PatSnap. All rights reserved.Legal|Privacy policy|Modern Slavery Act Transparency Statement|Sitemap|About US| Contact US: help@patsnap.com