Risk identification method, device and electronic equipment
By constructing a latent semantic space and using risk feedback topic recognition technology, the potential risks of long-term rental housing operators are automatically identified, solving the problems of delayed detection and low efficiency of manual screening in existing technologies, and enabling timely identification and prediction of risks.
Patent Information
- Authority / Receiving Office
- CN · China
- Patent Type
- Patents(China)
- Current Assignee / Owner
- TENCENT TECHNOLOGY (SHENZHEN) CO LTD
- Filing Date
- 2021-03-18
- Publication Date
- 2026-06-12
AI Technical Summary
Existing technologies suffer from delays in detecting long-term rental property operators absconding, and the low efficiency of manual screening makes it difficult to detect and address risks in a timely manner.
By acquiring the abnormal feedback text of the object to be predicted, generating the abnormal feedback vector to be predicted, constructing the latent semantic space, using latent semantic analysis technology to determine the risk feedback topic, generating risk warning information, and automatically identifying potential risks.
It enables timely identification and prediction of the risk of long-term rental property operators absconding, reducing the workload of manual screening and improving the efficiency and accuracy of risk detection.
Smart Images

Figure CN115115075B_ABST
Abstract
Description
Technical Field
[0001] This disclosure relates to the field of computer and Internet technology, and in particular to a risk identification method and apparatus, electronic device and computer-readable storage medium. Background Technology
[0002] With economic development, long-term rentals have become a relatively common rental model in the rental market.
[0003] However, due to poor management, long-term rental property operators often go bankrupt. This can refer to platforms providing long-term rentals ceasing operations, becoming unreachable, or going bankrupt, or the platform's legal representative absconding, resulting in property providers not receiving rent and tenants being unable to protect their legal rights.
[0004] Currently, the detection of long-term rental property operators who abscond is mainly achieved through news reports, public opinion, and bulk user complaints. Once public opinion and bulk complaints provide concrete evidence that a particular operator has absconded, manual verification can link multiple merchant accounts of the same operator for penalties.
[0005] Currently, the difficulties in dealing with long-term rental property owners who abscond mainly lie in two aspects: First, there is a serious lag in discovering problematic merchants through news reports or concentrated loss reports; second, it requires manual screening of information related to merchants in public opinion, which involves a large amount of manual work and wastes human resources.
[0006] It should be noted that the information disclosed in the background section above is only used to enhance the understanding of the background of this disclosure, and therefore may include information that does not constitute prior art known to those skilled in the art. Summary of the Invention
[0007] The purpose of this disclosure is to provide a risk identification method, apparatus, and electronic device that can accurately predict the risk feedback topic of the abnormal feedback text to be predicted for the object to be predicted, so as to further confirm the risk situation of the object to be predicted.
[0008] Other features and advantages of this disclosure will become apparent from the following detailed description, or may be learned in part from practice of this disclosure.
[0009] This disclosure provides a risk identification method, comprising: acquiring unpredictable abnormal feedback text for a target object; generating an unpredictable abnormal feedback vector based on the unpredictable abnormal feedback text; acquiring a latent semantic space generated based on abnormal feedback text samples, wherein the abnormal feedback text samples are determined from the abnormal feedback text for a target risk object, and the latent semantic space includes abnormal feedback text sample vectors clustered according to N risk feedback topics, where N is a positive integer greater than or equal to 1; obtaining the similarity between the abnormal feedback text sample vectors clustered under each risk feedback topic and the unpredictable abnormal feedback vector; determining the risk feedback topic of the unpredictable abnormal feedback vector among the N risk feedback topics based on the similarity; and generating risk warning information for the target object based on the risk feedback topic.
[0010] This disclosure provides a risk identification device, including: a module for obtaining the text of an anomaly to be predicted, a module for obtaining the vector of an anomaly to be predicted, a module for obtaining the latent semantic space, a module for determining similarity, a module for determining the first risk feedback topic, and a module for generating the first risk warning information.
[0011] The module for acquiring the anomaly feedback text to be predicted can be configured to acquire the anomaly feedback text to be predicted for the target object. The module for acquiring the anomaly feedback vector to be predicted can be configured to generate an anomaly feedback vector to be predicted based on the anomaly feedback text. The module for acquiring the latent semantic space can be configured to acquire a latent semantic space generated from anomaly feedback text samples, where the anomaly feedback text samples are determined from the anomaly feedback texts targeting the target risk object. The latent semantic space includes anomaly feedback text sample vectors clustered according to N risk feedback topics, where each anomaly feedback text sample vector corresponds to an anomaly feedback text sample, and N is a positive integer greater than or equal to 1. The similarity determination module can be configured to obtain the similarity between the anomaly feedback text sample vectors clustered under each risk feedback topic and the anomaly feedback vector to be predicted. The first risk feedback topic determination module can be configured to determine the risk feedback topic of the anomaly feedback vector to be predicted among the N risk feedback topics based on the similarity. The first risk warning information generation module can be configured to generate risk warning information for the target object based on the risk feedback topic.
[0012] In some embodiments, the latent semantic space further includes abnormal feedback word vectors clustered according to the N risk feedback topics, the N risk feedback topics including a first risk feedback topic, and the abnormal feedback word vectors including multiple target abnormal feedback word vectors clustered under the first risk feedback topic; wherein, the risk identification device further includes: a vector similarity determination module, a topic similarity determination module, a second risk feedback topic determination module, and a second risk warning information generation module.
[0013] The vector similarity determination module can be configured to determine the similarity between multiple target abnormal feedback word vectors clustered under the first risk feedback topic and the abnormal feedback vector to be predicted; the topic similarity determination module can be configured to determine the similarity between the abnormal feedback vector to be predicted and the first risk feedback topic based on the similarity between the multiple target abnormal feedback word vectors clustered under the first risk feedback topic and the abnormal feedback vector to be predicted; the second risk feedback topic determination module can be configured to determine that if the similarity between the abnormal feedback vector to be predicted and the first risk feedback topic is greater than a first threshold, then the first risk feedback topic is the risk feedback topic of the abnormal feedback vector to be predicted; the second risk warning information generation module can be configured to generate risk warning information for the object to be predicted based on the first risk feedback topic.
[0014] In some embodiments, the latent semantic space acquisition module may include: an anomaly feedback text sample acquisition unit, an original word segmentation text matrix generation unit, a matrix decomposition unit, and a reconstruction unit.
[0015] The abnormal feedback text sample acquisition unit can be configured to determine the abnormal feedback text sample from the abnormal feedback text targeting the target risk object, wherein the abnormal feedback text sample is composed of abnormal feedback words; the original word segmentation text matrix generation unit can be configured to generate an original word segmentation text matrix based on the abnormal feedback text sample; the matrix decomposition unit can be configured to perform matrix decomposition on the original word segmentation text matrix to obtain the word segmentation topic matrix, text topic matrix, and feature value matrix of the abnormal feedback text sample, wherein the word segmentation topic matrix clusters the abnormal feedback word vectors corresponding to the abnormal feedback words according to M risk feedback topics, and the text topic matrix clusters the abnormal feedback text sample vectors corresponding to the abnormal feedback text sample through the M risk feedback topics, where M is a positive integer greater than or equal to N; the reconstruction unit can be configured to construct the latent semantic space based on the feature value matrix, the word segmentation topic matrix of the abnormal feedback text sample, and the text topic matrix.
[0016] In some embodiments, the reconstruction unit may include: a low-order feature value matrix acquisition subunit, a low-order word segmentation text matrix generation subunit, and a latent semantic space construction subunit.
[0017] The low-order feature value matrix acquisition subunit can be configured to set the feature values less than the second threshold in the feature value matrix to 0 to obtain the low-order feature value matrix; the low-order word segmentation text matrix generation subunit can be configured to determine the low-order word segmentation text matrix based on the low-order feature value matrix, the word segmentation topic matrix, and the text topic matrix; the latent semantic space construction subunit can be configured to construct the latent semantic space generated based on the abnormal feedback text sample based on the low-order word segmentation text matrix.
[0018] In some embodiments, the latent semantic space construction subunit includes: a loss value determination subunit, a second threshold adjustment subunit, and a low-order word segmentation text matrix update subunit.
[0019] The loss value determination subunit can be configured to determine the loss value between the original word segmentation text matrix and the low-order word segmentation text matrix; the second threshold adjustment subunit can be configured to adjust the second threshold based on the loss value between the original word segmentation text matrix and the low-order word segmentation text matrix; the low-order word segmentation text matrix update subunit can be configured to update the low-order word segmentation text matrix based on the adjusted second threshold, until the loss value between the original word segmentation text matrix and the low-order word segmentation text matrix meets a preset condition, and then the latent semantic space is constructed based on the low-order word segmentation text matrix that meets the preset condition.
[0020] In some embodiments, the N risk feedback topics include a second risk feedback topic. The similarity determination module may include: a target anomaly feedback text sample vector determination unit and a vector similarity determination unit.
[0021] The target anomaly feedback text sample vector determination unit can be configured to determine multiple target anomaly feedback text sample vectors clustered under the second risk feedback topic in the latent semantic space based on the text topic matrix; the vector similarity determination unit can be configured to determine the similarity between each target anomaly feedback text sample vector and the anomaly feedback vector to be predicted.
[0022] In some embodiments, the first risk feedback topic determination module may include: a topic similarity determination unit and a risk feedback topic determination unit.
[0023] The topic similarity determination unit can be configured to determine the similarity between the anomaly feedback vector to be predicted and the second risk feedback topic based on the similarity between each target anomaly feedback text sample vector and the anomaly feedback vector to be predicted; the risk feedback topic determination unit can be configured to determine that if the similarity between the anomaly feedback vector to be predicted and the second risk feedback topic is greater than a third threshold, then the second risk feedback topic is the risk feedback topic of the anomaly feedback vector to be predicted.
[0024] In some embodiments, the original word segmentation text matrix is generated from the abnormal feedback text sample vectors corresponding to multiple abnormal feedback text samples, the multiple abnormal feedback text samples including target abnormal feedback text samples; wherein, the original word segmentation text matrix generation unit may include: a target abnormal feedback vocabulary acquisition subunit, an inverse document frequency determination subunit, a word frequency determination subunit, and a target abnormal feedback text sample vector determination subunit.
[0025] The target anomaly feedback vocabulary acquisition subunit can be configured to determine a target anomaly feedback vocabulary based on the anomaly feedback text targeting the target risk object; the inverse document frequency determination subunit can be configured to determine the inverse document frequency of each word in the target anomaly feedback vocabulary in the plurality of anomaly feedback text samples; the word frequency determination subunit can be configured to determine the word frequency of each word in the target anomaly feedback vocabulary in the target anomaly feedback text samples; and the target anomaly feedback text sample vector determination subunit can be configured to determine the target anomaly feedback text sample vector of the target anomaly feedback text sample based on the word frequency of each word in the target anomaly feedback text sample and the inverse document frequency in the plurality of anomaly feedback text samples, so as to determine the original word segmentation text matrix based on the target anomaly feedback text sample vector.
[0026] This disclosure provides an electronic device comprising: one or more processors; and a storage device for storing one or more programs, wherein when the one or more programs are executed by the one or more processors, the one or more processors implement the risk identification method described above.
[0027] This disclosure provides a computer-readable storage medium having a computer program stored thereon, which, when executed by a processor, implements the risk identification method as described in any of the preceding embodiments.
[0028] This disclosure provides a computer program product or computer program that includes computer instructions stored in a computer-readable storage medium. A processor of a computer device reads the computer instructions from the computer-readable storage medium and executes the computer instructions, causing the computer device to perform the aforementioned risk identification method.
[0029] The risk identification method, apparatus, electronic device, and computer-readable storage medium provided in this disclosure, on the one hand, acquire a latent semantic space generated from the abnormal feedback text of the target risk object. This latent semantic space can describe the correlation between various abnormal feedback text samples and abnormal feedback words according to themes (i.e., it can cluster abnormal feedback texts and abnormal feedback words according to themes to better mine the potential correlation between different texts). Then, it determines whether the object to be predicted, which is targeted by the abnormal feedback vector to be predicted, is risky based on the similarity between the abnormal feedback vector to be predicted and the abnormal feedback text sample vector in the latent semantic space. On the other hand, while determining that the object to be predicted is risky, it also determines the abnormal feedback theme of the abnormal feedback vector to be predicted, effectively mining the text information of the abnormal feedback vector to be predicted, which helps to discover the potential problems of the object to be predicted in a timely manner.
[0030] It should be understood that the above general description and the following detailed description are merely exemplary and do not limit this disclosure. Attached Figure Description
[0031] The accompanying drawings, which are incorporated in and form part of this specification, illustrate embodiments consistent with this disclosure and, together with the description, serve to explain the principles of this disclosure. It is obvious that the drawings described below are merely some embodiments of this disclosure, and those skilled in the art can obtain other drawings based on these drawings without any inventive effort.
[0032] Figure 1 A schematic diagram of an exemplary system architecture applied to a risk identification method or risk identification device according to embodiments of this disclosure is shown.
[0033] Figure 2 A schematic diagram of the structure of an electronic device suitable for implementing embodiments of the present disclosure is shown.
[0034] Figure 3 This is a flowchart illustrating a risk identification method according to an exemplary embodiment.
[0035] Figure 4 This is a schematic diagram illustrating an anomaly feedback text vector generation according to an exemplary embodiment.
[0036] Figure 5This is a flowchart illustrating a risk identification method according to an exemplary embodiment.
[0037] Figure 6 yes Figure 3 The flowchart of step S3 in an exemplary embodiment.
[0038] Figure 7 This is a schematic diagram illustrating the construction of a word segmentation text matrix according to an exemplary embodiment.
[0039] Figure 8 This is a schematic diagram of matrix decomposition according to an exemplary embodiment.
[0040] Figure 9 This is a schematic diagram illustrating a word segmentation or text clustering method according to an exemplary embodiment.
[0041] Figure 10 This is a schematic diagram illustrating a long-term rental property merchant absconding prediction product according to an exemplary embodiment.
[0042] Figure 11 This is a schematic flowchart illustrating a prediction of long-term rental property merchants running away, according to an exemplary embodiment.
[0043] Figure 12 This is a block diagram illustrating a risk identification device according to an exemplary embodiment. Detailed Implementation
[0044] Exemplary embodiments will now be described more fully with reference to the accompanying drawings. However, these exemplary embodiments can be implemented in many forms and should not be construed as limited to the embodiments set forth herein; rather, they are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the exemplary embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar parts, and therefore repeated descriptions of them will be omitted.
[0045] The features, structures, or characteristics described in this disclosure can be combined in any suitable manner in one or more embodiments. Numerous specific details are provided in the following description to give a thorough understanding of embodiments of this disclosure. However, those skilled in the art will recognize that the technical solutions of this disclosure can be practiced with one or more specific details omitted, or other methods, components, apparatuses, steps, etc., can be employed. In other instances, well-known methods, apparatuses, implementations, or operations are not shown or described in detail to avoid obscuring various aspects of this disclosure.
[0046] The accompanying drawings are merely illustrative of this disclosure, and the same reference numerals in the drawings denote the same or similar parts, thus omitting repeated descriptions of them. Some block diagrams shown in the drawings do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in software, in one or more hardware modules or integrated circuits, or in different network and / or processor devices and / or microcontroller devices.
[0047] The flowchart shown in the accompanying drawings is merely illustrative and does not necessarily include all content and steps, nor does it require execution in the described order. For example, some steps may be broken down, while others may be combined or partially combined; therefore, the actual execution order may change depending on the specific circumstances.
[0048] In this specification, the terms “a,” “an,” “the,” “the,” and “at least one” are used to indicate the presence of one or more elements / components / etc.; the terms “comprising,” “including,” and “having” are used to indicate an open-ended inclusion and to mean that there may be other elements / components / etc. in addition to the listed elements / components / etc.; the terms “first,” “second,” and “third,” etc., are used only as markings and are not a limitation on the number of objects.
[0049] The exemplary embodiments of this disclosure will now be described in detail with reference to the accompanying drawings.
[0050] Figure 1 A schematic diagram of an exemplary system architecture that can be applied to the risk identification method or risk identification device in the embodiments of this disclosure is shown.
[0051] like Figure 1 As shown, system architecture 100 may include terminal devices 101, 102, and 103, a network 104, and a server 105. Network 104 serves as the medium for providing communication links between terminal devices 101, 102, and 103 and server 105. Network 104 may include various connection types, such as wired or wireless communication links, or fiber optic cables, etc.
[0052] Users can use terminal devices 101, 102, and 103 to interact with server 105 via network 104 to receive or send messages. For example, users can send text to be predicted for a target object or abnormal feedback text for a target risk object to server 105 via terminal devices 101, 102, or 103, and can receive risk warning information for the target object generated by server 105 based on the first risk feedback topic. Terminal devices 101, 102, and 103 can be various electronic devices with displays and web browsing capabilities, including but not limited to smartphones, tablets, laptops, desktop computers, wearable devices, virtual reality devices, smart home devices, etc.
[0053] Server 105 can be a server that provides various services, such as a backend management server that supports the devices operated by users using terminal devices 101, 102, and 103. The backend management server can analyze and process received requests and other data, and feed the processing results back to the terminal devices.
[0054] A server can be a standalone physical server, a server cluster or a distributed system composed of multiple physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms. This disclosure does not impose any restrictions on this.
[0055] Server 105 may, for example, acquire the anomaly feedback text to be predicted for the target object; server 105 may, for example, generate an anomaly feedback vector to be predicted based on the anomaly feedback text; server 105 may, for example, acquire a latent semantic space generated based on anomaly feedback text samples, wherein the anomaly feedback text samples are determined in the anomaly feedback text for the target risk object, and the latent semantic space includes anomaly feedback text sample vectors clustered according to N risk feedback topics, wherein the anomaly feedback text sample vectors correspond to the anomaly feedback text samples, and N is a positive integer greater than or equal to 1; server 105 may, for example, obtain the similarity between the anomaly feedback text sample vectors clustered under each risk feedback topic and the anomaly feedback vector to be predicted; server 105 may, for example, determine a first risk feedback topic for the anomaly feedback vector to be predicted among the N risk feedback topics based on the similarity; server 105 may, for example, generate risk warning information for the target object based on the first risk feedback topic.
[0056] It should be understood that Figure 1The number of terminal devices, networks, and servers shown is merely illustrative. Server 105 can be a single physical server or a combination of multiple servers. Depending on actual needs, it can have any number of terminal devices, networks, and servers.
[0057] Figure 2 A schematic diagram of the structure of an electronic device suitable for implementing embodiments of the present disclosure is shown. It should be noted that... Figure 2 The electronic device 200 shown is merely an example and should not be construed as limiting the functionality and scope of use of the embodiments disclosed herein.
[0058] like Figure 2 As shown, the computer system 200 includes a central processing unit (CPU) 201, which can perform various appropriate actions and processes based on programs stored in read-only memory (ROM) 202 or programs loaded from storage section 208 into random access memory (RAM) 203. The RAM 203 also stores various programs and data required for the operation of the computer system 200. The CPU 201, ROM 202, and RAM 203 are interconnected via a bus 204. An input / output (I / O) interface 205 is also connected to the bus 204.
[0059] The following components are connected to I / O interface 205: an input section 206 including a keyboard, mouse, etc.; an output section 207 including a cathode ray tube (CRT), liquid crystal display (LCD), etc., and speakers, etc.; a storage section 208 including a hard disk, etc.; and a communication section 209 including a network interface card such as a LAN card, modem, etc. The communication section 209 performs communication processing via a network such as the Internet. Drive 210 is also connected to I / O interface 205 as needed. Removable media 211, such as a disk, optical disk, magneto-optical disk, semiconductor memory, etc., are installed on drive 210 as needed so that computer programs read from them can be installed into storage section 208 as needed.
[0060] In particular, according to embodiments of this disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of this disclosure include a computer program product comprising a computer program carried on a computer-readable storage medium, the computer program containing program code for performing the methods shown in the flowcharts. In such embodiments, the computer program can be downloaded and installed from a network via communication section 209, and / or installed from removable medium 211. When the computer program is executed by central processing unit (CPU) 201, it performs the functions defined above in the system of this application.
[0061] It should be noted that the computer-readable storage medium shown in this application can be a computer-readable signal medium, a computer-readable storage medium, or any combination thereof. A computer-readable storage medium can be, for example,—but not limited to—an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of a computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer disk, a hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination thereof. In this application, a computer-readable storage medium can be any tangible medium containing or storing a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In this application, a computer-readable signal medium can include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code. Such propagated data signals can take various forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination thereof. Computer-readable signal media can also be any computer-readable storage medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in connection with an instruction execution system, apparatus, or device. The program code contained on the computer-readable storage medium can be transmitted using any suitable medium, including but not limited to: wireless, wire, optical fiber, RF, etc., or any suitable combination thereof.
[0062] The flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality, and operation of possible implementations of systems, methods, and computer program products according to various embodiments of this application. In this regard, each block in a flowchart or block diagram may represent a module, segment, or portion of code containing one or more executable instructions for implementing a specified logical function. It should also be noted that in some alternative implementations, the functions indicated in the blocks may occur in a different order than those indicated in the drawings. For example, two consecutively indicated blocks may actually be executed substantially in parallel, and they may sometimes be executed in reverse order, depending on the functions involved. It should also be noted that each block in a block diagram or flowchart, and combinations of blocks in a block diagram or flowchart, may be implemented using a dedicated hardware-based system that performs the specified function or operation, or using a combination of dedicated hardware and computer instructions.
[0063] The modules and / or units and / or sub-units and / or grandchild units described in the embodiments of this application can be implemented in software or hardware. The described modules and / or units and / or sub-units and / or grandchild units can also be located in a processor; for example, a processor can be described as including a sending unit, an acquiring unit, a determining unit, and a first processing unit. The names of these modules and / or units and / or sub-units and / or grandchild units do not, in certain circumstances, constitute a limitation on the module and / or unit and / or sub-unit and / or grandchild unit itself.
[0064] In another aspect, this application also provides a computer-readable storage medium, which may be included in the device described in the above embodiments; or it may exist independently and not assembled into the device. The computer-readable storage medium carries one or more programs, which, when executed by the device, enable the device to perform the following functions: acquiring anomaly feedback text to be predicted for an object to be predicted; generating anomaly feedback vector to be predicted based on the anomaly feedback text; acquiring a latent semantic space generated based on anomaly feedback text samples, wherein the anomaly feedback text samples are determined in anomaly feedback text for a target risk object, and the latent semantic space includes anomaly feedback text sample vectors clustered according to N risk feedback topics, wherein the anomaly feedback text sample vectors correspond to the anomaly feedback text samples, and N is a positive integer greater than or equal to 1; obtaining the similarity between the anomaly feedback text sample vectors clustered under each risk feedback topic and the anomaly feedback vector to be predicted; determining a first risk feedback topic for the anomaly feedback vector to be predicted among the N risk feedback topics based on the similarity; and generating risk warning information for the object to be predicted based on the first risk feedback topic.
[0065] Figure 3 This is a flowchart illustrating a risk identification method according to an exemplary embodiment. The method provided in this disclosure can be executed by any electronic device with computing power, for example, the method can be executed by the above-described... Figure 1 The execution can be performed by a server or terminal device in the embodiments, or it can be performed by both a server and a terminal device. In the following embodiments, the server is used as the execution subject for illustration, but this disclosure is not limited to this.
[0066] Reference Figure 3 The risk identification method provided in this disclosure may include the following steps.
[0067] Step S1: Obtain the feedback text of the anomaly to be predicted for the object to be predicted.
[0068] In some embodiments, users can provide feedback on anomalies to the object to be predicted through certain channels. The object to be predicted can be an object that requires risk prediction, such as a merchant or platform providing long-term rental services. Therefore, the feedback information regarding anomalies to long-term rental service merchants or platforms can be user complaints against them, and this disclosure does not impose any restrictions on this.
[0069] In some embodiments, the risk status of the object to be predicted can be predicted based on the user's abnormal feedback text regarding the object. Therefore, the abnormal feedback text used to predict the risk status of the object can be used as the abnormal feedback text to be predicted.
[0070] During the anomaly feedback process, different users may provide different types of anomaly feedback regarding the object to be predicted. For example, in the process of complaining about long-term rental merchants, users may complain that the merchant refuses to refund the deposit, refuse to perform maintenance, or charge excessive deposits. In other words, users can provide feedback from multiple anomaly perspectives regarding the same object to be predicted. These multiple anomaly perspectives can be reflected in a single anomaly feedback text or in different anomaly feedback texts; this disclosure does not impose any restrictions on this.
[0071] In this embodiment, the feedback text for the anomaly to be predicted for the object to be predicted can be one or more, and this disclosure does not limit it.
[0072] Step S2: Generate a predicted anomaly feedback vector based on the predicted anomaly feedback text.
[0073] In some embodiments, an anomaly feedback vocabulary (e.g., ...) can be pre-determined based on the anomaly feedback text targeting the risk object. Figure 4 The vocabulary shown in the left column can cover as many different abnormal feedback terms as possible, and this disclosure does not impose any limitations on this.
[0074] In some embodiments, a predicted anomaly feedback vector can be constructed using individual words as features for the text to be predicted. The feature value of each word in the predicted anomaly feedback vector can be the presence or absence of that single word in the text to be predicted (for example, if the single word exists in the text to be predicted, the value at the corresponding position in the predicted anomaly feedback vector is 1; if the single word does not exist in the text to be predicted, the value at the corresponding position in the predicted anomaly feedback vector is 0). The feature value of each word in the predicted anomaly feedback vector can also be the TF-IDF (term frequency–inverse document frequency) value of the single word in the text to be predicted, and this disclosure does not limit this.
[0075] Step S3: Obtain the latent semantic space generated based on the abnormal feedback text samples. The abnormal feedback text samples are determined in the abnormal feedback texts targeting the target risk object. The latent semantic space includes abnormal feedback text sample vectors clustered according to N risk feedback topics. The abnormal feedback text sample vectors correspond to the abnormal feedback text samples, and N is a positive integer greater than or equal to 1.
[0076] Understandably, to determine whether the object to be predicted carries the target risk, it is necessary to use known objects with the target risk as comparison samples. Therefore, the target risk object can be a known object with the target risk. For example, in the process of predicting the risk of long-term rental property owners running away with investors' money, the target risk object can be a known long-term rental property owner or platform that has run away with investors' money.
[0077] The sample abnormal feedback text for a target risk object can include abnormal feedback text from users providing feedback on the target risk object from different perspectives. For example, in the process of predicting the risk of a long-term rental property running away with funds, the abnormal feedback text for the target risk object could include abnormal feedback text complaining that the target risk object does not return the deposit, abnormal feedback text complaining that the target risk object does not repair the property, abnormal feedback text complaining that the target risk object collects excessive deposits, etc. It is understood that users can provide abnormal feedback from multiple perspectives on the same target risk object, and these multiple perspectives can be reflected in one abnormal feedback text or in different abnormal feedback texts; this disclosure does not impose any restrictions on this.
[0078] In some embodiments, the latent semantic space generated from the anomaly feedback text samples can be composed of multiple anomaly feedback text sample vectors, each of which has a corresponding anomaly feedback text sample. The latent semantic space can display both shallow relationships (i.e., surface semantic relationships) and deep relationships (including but not limited to deep synonym relationships) between the various text sample vectors. For example, the latent semantic space can cluster the text sample vectors according to topic similarity to uncover deep relationships between different semantic text sample vectors.
[0079] In this context, the risk feedback topic can refer to the clustering topic generated after clustering abnormal feedback text sample vectors. For example, when some abnormal feedback text sample vectors are clustered in the latent semantic space, the clustering topic can be determined based on the content of the abnormal feedback text samples corresponding to the abnormal feedback text sample vectors. The aforementioned clustering topic can include the feedback angle when providing feedback to a target risk object. For example, in the process of filing a complaint against a long-term rental property owner, the feedback angle of not refunding rent can be considered a risk feedback topic.
[0080] Among them, risk feedback topics are a type of topic information that can be used to reflect the topic content of clustered text. For example, when some words or sentences reflecting merchants refusing to refund deposits are clustered, the risk feedback topic generated by this cluster can be "risk feedback topic of merchants refusing to refund deposits," etc. Users can define topics for each cluster according to actual needs, and this disclosure does not impose any restrictions on this. In some embodiments, the latent semantic space can be constructed through topic models such as Latent Semantic Index (LSI), Latent Semantic Analysis (LSA), or risk topic identification such as Probabilistic Latent Semantic Analysis (PLSA) and Latent Dirichlet Allocation (LDA), and this disclosure does not impose any restrictions on this.
[0081] Step S4: Obtain the similarity between the abnormal feedback text sample vectors clustered under each risk feedback topic and the abnormal feedback vector to be predicted.
[0082] In some embodiments, given that the abnormal feedback text sample vectors in the known latent semantic space are clustered according to risk feedback topics, abnormal feedback text sample vectors clustered under different risk feedback topics can be obtained respectively, and the similarity between the abnormal feedback vector to be predicted and the abnormal feedback text sample vectors under each risk feedback topic can be determined respectively, so as to further determine the similarity between the abnormal feedback vector to be predicted and each risk feedback topic, so as to further determine the risk feedback topic of the abnormal feedback vector to be predicted.
[0083] Step S5: Determine the risk feedback topic of the abnormal feedback vector to be predicted from the N risk feedback topics based on the similarity, so as to generate risk warning information for the object to be predicted based on the risk feedback topic.
[0084] In some embodiments, the similarity between the anomaly feedback vector to be predicted and each risk feedback topic can be determined based on the similarity between the anomaly feedback text sample vector clustered under each risk feedback topic and the anomaly feedback vector to be predicted, and the risk feedback topic of the anomaly feedback vector to be predicted can be determined based on the similarity between the anomaly feedback vector to be predicted and each risk feedback topic.
[0085] For example, assuming the aforementioned risk feedback topic may include a second risk feedback topic, multiple target abnormal feedback text sample vectors clustered under the second risk feedback topic can be obtained in the latent semantic space. Then, the similarity between each target abnormal feedback text sample clustered under the second risk feedback topic and the abnormal feedback vector to be predicted can be determined. Then, the similarity between the abnormal feedback vector to be predicted and the second risk feedback topic can be determined based on the similarity between the target abnormal feedback text sample vector clustered under the second risk feedback topic and the abnormal feedback vector to be predicted. For example, the similarity between the abnormal feedback vector to be predicted and each target abnormal feedback text sample vector clustered under the second risk feedback topic can be averaged (or median, maximum, minimum, etc.) to determine the similarity between the abnormal feedback vector to be predicted and the second risk feedback topic.
[0086] In some embodiments, if the similarity between the anomaly feedback vector to be predicted and a certain risk feedback topic is greater than a certain threshold, then the anomaly feedback text corresponding to the anomaly feedback vector to be predicted can be considered to belong to the risk feedback topic. For example, if the similarity between the anomaly feedback vector to be predicted and the second risk feedback topic is greater than 0.8, then the anomaly feedback text corresponding to the anomaly feedback vector to be predicted can be considered to belong to the anomaly feedback text of the second risk feedback topic.
[0087] In some embodiments, when a predicted anomaly feedback vector corresponds to a risk feedback topic in the latent semantic space, it can be considered that the predicted object has a certain risk, and risk warning information for the predicted object can be generated based on the risk feedback topic. This disclosure does not limit this.
[0088] For example, the latent semantic space can be constructed based on abnormal feedback texts of long-term rental businesses that have absconded. The N risk feedback topics in this latent semantic space can be multiple risk feedback topics related to the absconding of long-term rental businesses (such as deposit arrears, refusal to maintain the property, etc.). Once the abnormal feedback vector to be predicted belongs to any of the above risk feedback topics, it can be considered that the object to be predicted may be at risk of absconding. Then, risk warning information for the object to be predicted can be generated based on the risk feedback topic to which the abnormal feedback vector to be predicted belongs, so as to remind users that the object to be predicted is at risk of absconding, or directly punish the object to be predicted.
[0089] The risk identification method provided in this disclosure has two aspects. First, it acquires a latent semantic space generated from the abnormal feedback text of the target risk object. This latent semantic space can describe the correlation between various abnormal feedback text samples and abnormal feedback words according to themes (i.e., it can cluster abnormal feedback texts and abnormal feedback words according to themes to better mine the potential correlation between different texts). Then, it determines whether the object to be predicted is risky based on the similarity between the abnormal feedback vector to be predicted and the abnormal feedback text sample vector in the latent semantic space. Second, while determining that the object to be predicted is risky, it also determines the abnormal feedback theme of the abnormal feedback vector to be predicted, effectively mining the text information of the abnormal feedback vector to be predicted, which helps to discover the potential problems of the object to be predicted in a timely manner.
[0090] Figure 5 This is a flowchart illustrating a risk identification method according to an exemplary embodiment.
[0091] In some embodiments, the latent semantic space generated from the abnormal feedback text samples further includes abnormal feedback word vectors clustered according to N risk feedback topics, where the N risk feedback topics include a first risk feedback topic, and the abnormal feedback word vectors include multiple target abnormal feedback word vectors clustered under the first risk feedback topic.
[0092] Reference Figure 5 The risk identification method provided in this disclosure may include the following steps.
[0093] In step S6, the similarity between the clustered target abnormal feedback word vectors under the first risk feedback topic and the abnormal feedback vector to be predicted is determined.
[0094] In step S7, the similarity between the anomaly feedback vector to be predicted and the first risk feedback topic is determined based on the similarity between the multiple target anomaly feedback word vectors clustered under the first risk feedback topic and the anomaly feedback vector to be predicted.
[0095] In some embodiments, the similarity between the predicted abnormal feedback vector and each target abnormal feedback word vector can be averaged (or median, maximum, or minimum) to be used as the similarity between the predicted abnormal feedback vector and the first risk feedback topic.
[0096] In step S8, if the similarity between the anomaly feedback vector to be predicted and the first risk feedback topic is greater than a first threshold, then the first risk feedback topic is the risk feedback topic of the anomaly feedback vector to be predicted.
[0097] In step S9, risk warning information for the object to be predicted is generated based on the first risk feedback topic.
[0098] The technical solution provided in this embodiment, after determining the risk feedback topic of the abnormal feedback vector to be predicted based on the abnormal feedback text sample vectors clustered according to the risk feedback topic in the latent semantic space, further mines the risk feedback topic of the abnormal feedback vector to be predicted based on the abnormal feedback word vectors clustered according to the risk feedback topic in the latent semantic space, so as to accurately determine the risk information carried in the abnormal feedback vector to be predicted, and thus determine the risk status of the object to be predicted.
[0099] Figure 6 yes Figure 3 A flowchart of step S3 in an exemplary embodiment. (See reference) Figure 6 Step S3 above may include the following steps.
[0100] In text mining, the common practice is to cluster data based on the similarity of sample features, such as clustering based on the size of Euclidean distance or Manhattan distance between data samples. However, this clustering method cannot effectively uncover the potential relationships between different texts. For example, we can easily find that the corresponding texts of the phrases "in the name of the people" and "Secretary Dakang" have a high degree of thematic relevance. However, if we cluster them using word features, it is difficult to associate the two because word feature clustering methods cannot take into account the implicit themes.
[0101] In order to accurately find the implicit relationships between different anomaly feedback texts and construct the latent semantic space between different anomaly feedback text samples, the present disclosure adopts the following method.
[0102] In step S31, the abnormal feedback text sample is determined from the abnormal feedback text for the target risk object, and the abnormal feedback text sample is composed of abnormal feedback words.
[0103] In step S32, an original word segmentation text matrix is generated based on the abnormal feedback text sample.
[0104] In some embodiments, the original word segmentation text matrix described above may be generated from the abnormal feedback text sample vectors corresponding to multiple abnormal feedback text samples, and the multiple abnormal feedback text samples may include the target abnormal feedback text sample.
[0105] Generating the original word segmentation text matrix based on the abnormal feedback text sample can include the following steps.
[0106] Determine the target abnormal feedback vocabulary (e.g., based on the abnormal feedback text targeting the target risk object) Figure 7 The vocabulary is listed on the left. The inverse document frequency (IVF) of each word in the anomaly feedback vocabulary in multiple anomaly feedback text samples is determined. The word frequency (QF) of each word in the anomaly feedback vocabulary in the target anomaly feedback text sample is determined. Based on the QF of each word in the target anomaly feedback text sample and the IVF of each word in multiple anomaly feedback text samples, the target anomaly feedback text sample vector is determined, so that the original word segmentation text matrix (e.g., ...) can be determined based on the target anomaly feedback text sample vector. Figure 7 (As shown).
[0107] The TF-IDF values of words in the target anomaly feedback text samples can be obtained through the following steps.
[0108] 1. Determine the inverse document frequency of each word in the anomaly feedback vocabulary in the plurality of anomaly feedback text samples.
[0109] The more frequently a term appears in all texts (i.e., multiple anomaly feedback text samples), the less meaningful that term is to a given text. Inverse document frequency (IVF) can be used to represent the importance of a term across the entire text set; its calculation expression is as follows:
[0110] IDF = log(total number of texts ÷ total number of texts containing a given term) (1)
[0111] 2. Determine the word frequency of each word in the target anomaly feedback vocabulary in the target anomaly feedback text sample.
[0112] The more times a term appears in a text, the greater its importance to the text. This is expressed using the formula for calculating term frequency (TF).
[0113] TF = Number of times a term appears in the text ÷ Total number of terms in the text (2)
[0114] 3. Determine the target anomaly feedback text sample vector of the target anomaly feedback text sample based on the word frequency of each word in the target anomaly feedback text sample and the inverse document frequency in the multiple anomaly feedback text samples, so as to determine the original word segmentation text matrix based on the target anomaly feedback text sample vector.
[0115] Taking both factors into account, the product of the two factors is used as the importance of each term to each text, thereby constructing the anomaly feedback vector to be predicted.
[0116] In some embodiments, the word segmentation text matrix can use each word in the text as a row (or column) and each text as a column (or row) to represent the weight of the word in the corresponding row corresponding to the text in that column, that is, the importance of a word to the text information.
[0117] In step S33, the original word segmentation text matrix is decomposed to obtain the word segmentation topic matrix, text topic matrix, and feature value matrix of the abnormal feedback text sample. The word segmentation topic matrix clusters the abnormal feedback word vectors corresponding to the abnormal feedback words according to M risk feedback topics. The text topic matrix clusters the abnormal feedback text sample vectors corresponding to the abnormal feedback text samples according to the M risk feedback topics, where M is a positive integer greater than or equal to N.
[0118] In some embodiments, singular value decomposition (SVD), singular value decomposition with bias terms, etc., can be used to decompose the segmented text matrix to obtain the segmented topic matrix, text topic matrix, and feature value matrix of the abnormal feedback text sample, etc., and this disclosure does not limit this.
[0119] This embodiment will use singular value decomposition as an example to explain how to decompose the word segmentation text matrix, but this disclosure is not limited thereto.
[0120] The essence of Singular Value Decomposition (SVD) is matrix decomposition. However, unlike conventional eigenvalue decomposition, SVD does not require the matrix to be decomposed into a square matrix. For example, assuming the segmented text matrix A is an (M×N) matrix, the singular value decomposition form of the segmented text matrix A is:
[0121]
[0122] At this point, the correlation matrix between words and text can be represented as:
[0123] AA T =(UΣV T )(UΣV T ) T =UΣV T VΣT U T =UΣΣ T U T (4)
[0124] A T A=(UΣV T ) T (UΣV T )=VΣ T U T UΣV T =VΣ T ΣV T (5)
[0125] Due to ΣΣ T and Σ T Σ are all diagonal matrices, then AA T and A T The eigenvectors of A are composed of matrices U and V, respectively, and AA T and A T A are square matrices of (m×m) and (n×n) respectively, therefore both can be decomposed to obtain eigenvalues and eigenmatrices:
[0126] (AA T )u i =λ i u i , (A T A)v i =λ i v i (6)
[0127] Where u i v represents the left singular vector of matrix A. i Let Σ represent the right singular vector of matrix A, and ΣΣ T =Σ T Σ=Σ 2 Therefore, we can conclude that the eigenvalue matrix is equal to the square of the singular value matrix, i.e. Where σ i This represents singular values.
[0128] Using the method described above, the word segmentation text matrix A can be decomposed into the following: Figure 8 The diagram shows the word segmentation topic matrix U, the text topic matrix Σ, and the eigenvalue matrix V. Both the word segmentation topic matrix U and the text topic matrix V include K risk feedback topics, where K is the number of non-zero eigenvalues in the eigenvalue matrix.
[0129] like Figure 8As shown, the original segmented text matrix A, after undergoing singular value decomposition (SVD), yields three matrices. The left singular matrix U can be considered the segmentation topic matrix, representing the correlation between a word and various risk feedback topics. Each column (a left singular vector) represents a term related to the corresponding risk feedback topic, with each element's value indicating the degree of relevance of a term within that topic; a larger value indicates a stronger correlation. The right singular matrix V represents the correlation features between the text and risk feedback topics. Each row (a right singular vector) represents the corresponding abnormal feedback text within the same risk feedback topic, with each element's value indicating the degree of relevance of an abnormal feedback text within that topic. The singular values of the central singular value matrix (eigenvalue matrix) represent the semantic correlation between a class of words and an abnormal feedback topic. Therefore, with just one SVD, we can simultaneously obtain the correlation between abnormal feedback text samples and various risk feedback topics, the correlation between segmented words and different risk feedback topics, and the correlation between different risk feedback topics.
[0130] With the aforementioned word segmentation topic matrix, text topic matrix, and feature value matrix, the latent semantic space can be constructed based on the feature value matrix, the word segmentation topic matrix of the abnormal feedback text sample, and the text topic matrix.
[0131] However, since the original word segmentation text matrix is too large for the computer to process, too sparse, and contains a lot of noise, it is necessary to reduce the dimensionality of the original word segmentation text matrix to construct the latent semantic space.
[0132] In some embodiments, dimensionality reduction can be performed using the following process:
[0133] Through the process of singular value decomposition, we split the target matrix A and solve for the orthogonal matrices U, V and singular value matrix Σ. We select the largest t singular values and multiply them by their corresponding left and right singular vectors, respectively, to obtain a t-order approximation of the original matrix A, thus achieving dimensionality reduction of the original matrix A. Through the t-order approximation of the original matrix A, we transform the original matrix A from a high-dimensional space to a low-dimensional space, and also map word vectors and text vectors to the semantic space, thereby realizing the process of constructing a latent semantic space for the abnormal feedback text set.
[0134] In some other embodiments, the dimension processing can also be performed through the following procedure:
[0135] In step S34, the eigenvalues in the eigenvalue matrix that are less than the second threshold are set to 0 to obtain a low-order eigenvalue matrix.
[0136] In step S35, a low-order word segmentation text matrix is determined based on the low-order feature value matrix, the word segmentation topic matrix, and the text topic matrix.
[0137] In some embodiments, the low-order eigenvalue matrix can be reconstructed by multiplying the word segmentation topic matrix on the left and the text topic matrix on the right to obtain the low-order word segmentation text matrix.
[0138] It is understandable that if the eigenvalues in the low-order eigenvalue matrix are set to 0, then the corresponding values in the reconstructed low-order segmented text matrix will also be set to 0. Therefore, the order of the low-order segmented text matrix is lower than that of the original segmented text matrix.
[0139] In some embodiments, a latent semantic space can be constructed based on the low-order word segmentation text matrix. For example, the space constructed by the low-order word segmentation matrix can be directly used as the latent semantic space, or the latent semantic space can be constructed through the following steps.
[0140] In step S36, the loss value between the original word segmentation text matrix and the low-order word segmentation text matrix is determined.
[0141] In step S37, the second threshold is adjusted based on the loss value between the original segmented text matrix and the low-order segmented text matrix.
[0142] In some embodiments, if the loss value between the original segmented text matrix and the low-order segmented text matrix is too large, the second threshold can be reduced; if the loss value between the original segmented text matrix and the low-order segmented text matrix is too small, the second threshold can be increased.
[0143] In step S38, the low-order segmented text matrix is updated according to the adjusted second threshold until the loss value between the original segmented text matrix and the low-order segmented text matrix meets a preset condition. Then, the latent semantic space is constructed based on the low-order segmented text matrix that meets the preset condition.
[0144] In some embodiments, the aforementioned preset conditions may refer to pre-set loss value thresholds, etc., and this disclosure does not limit them.
[0145] After dimensionality reduction using the above method, we can obtain a low-order word segmentation topic matrix, a low-order text topic matrix, and a low-order word segmentation text matrix.
[0146] Assuming that the low-order word segmentation topic matrix and the low-order text topic matrix have only two dimensions, i.e., t equals 2, then it can be done as follows: Figure 9 The diagram shown illustrates the clustering relationships between each word segment and topic, and between each text segment and topic. For example... Figure 9 As shown, T N(N is a positive integer greater than or equal to 1 and less than or equal to the total number of texts) represents the text.
[0147] Users through such Figure 9 The planar diagram shown can easily reveal which words and texts will cluster, that is, it can uncover the potential correlation between different words or texts, and classify the text and words based on the clustering results.
[0148] In some embodiments, the text and terms in the low-order word segmentation text matrix can be clustered based on the aforementioned low-order word segmentation topic matrix and low-order text topic matrix (i.e., determining which words and texts in the low-order word segmentation text matrix are clustered under the same risk feedback topic based on the low-order word segmentation topic matrix and low-order text topic matrix).
[0149] In some embodiments, the semantic space constructed from the aforementioned low-order segmented text matrix can be a latent semantic space. The latent semantic space constructed from the low-order segmented text matrix removes noise from the original segmented text matrix; it also reduces the size of the original segmented text matrix, improving data processing efficiency; most importantly, it uncovers the potential relevance between different texts and segments (e.g., the relevance between synonyms, the relevance between synonymous sentences, etc.).
[0150] It should be noted that the risk feedback topics in the word segmentation text matrix referred to in this disclosure can be either risk feedback topics defined after word segmentation and text clustering, or topics defined according to the matrix dimensions. This disclosure does not impose any restrictions on this.
[0151] Currently, the detection of long-term rental property operators who abscond is mainly achieved through news reports, public opinion, and bulk user complaints. Once public opinion and bulk complaints provide concrete evidence that a particular operator has absconded, manual verification can link multiple merchant accounts of the same operator for penalties.
[0152] Therefore, the technical solution provided in this embodiment can be used to predict the risk of long-term rental merchants absconding. For example, it can be applied to scenarios where users can file a complaint through a one-click fault reporting function after making a transaction with a merchant using WeChat Pay. The specific product implementation process is as follows: Figure 10 As shown:
[0153] Figure 10 The product realization process shown can be achieved through Figure 11 The process is implemented as shown. Figure 11As shown, the system first collects textual complaint data from existing commercial complaints against long-term rental property owners who have absconded with the property. This data can include complaint texts of different types, such as complaints about unpaid deposits and refusal to repair properties. Then, the text is segmented in the background to construct an original segmented text matrix. Next, singular value decomposition and dimensionality reduction are performed on the original segmented text matrix to construct a potential semantic space. Finally, when a new complaint text enters the risk control system, the risk complaint topic can be obtained through similarity calculation. Based on text classification and similarity scores, corresponding risk control responses are made to achieve early warning of long-term rental property owners who have absconded with the property.
[0154] Applying the technical solution provided in this embodiment to predict the risk of long-term rental property merchants running away can achieve the following beneficial effects.
[0155] 1) Conduct text mining on early complaints against merchants to provide early warnings of merchants at risk of absconding.
[0156] Long-term rental businesses have long operating cycles, and there is a potential incubation period before they go bankrupt or abscond. Therefore, effective analysis of complaint text information can help identify potential problems in a timely manner before a large number of complaints or bankruptcies occur.
[0157] 2) Establish a complaint text analysis model to push for manual review or automatic penalties, reducing the workload of batch screening of merchants.
[0158] By using offline text mining models, a model with high accuracy in risk identification is trained, and merchants at risk of absconding are pushed to the investigation system or automatically punished.
[0159] This publication collects complaint information from long-term rental property companies that have gone bankrupt and absconded as training samples for text mining, and then performs complaint text mining on these training samples. Understandably, it can obtain the themes of the text based on Singular Value Decomposition (SVD). SVD decomposes a segmented text matrix into three matrices, achieving matrix dimensionality reduction and helping to uncover the implicit themes of the text. Simultaneously, it obtains the relevance of the complaint text to various sub-themes in the absconding incident, the relevance between text segmentation and different semantics, and the relevance between different semantics and various sub-themes in the absconding incident. This helps to fully mine the implicit information in the complaint text and construct an implicit semantic space. Therefore, when new customer complaints occur, risks can be identified in a timely manner, and early warnings can be issued to absconding merchants.
[0160] The technical solution proposed in this disclosure constructs a latent semantic space by mining the text of complaints against long-term rental property owners who have absconded. When new user complaints occur, the complaint text can be promptly matched with suspicious topics, improving the information mining accuracy of the complaint text. This also reduces manpower costs for current manual review when searching for long-term rental property owners at risk of absconding, and improves the timeliness of risk control. It provides a direction for early warning of long-term rental property owners who have absconded.
[0161] It is understood that the data involved in the methods provided in the above embodiments can all be stored on the blockchain, and this disclosure does not impose any restrictions on this.
[0162] Figure 12 This is a block diagram illustrating a risk identification device according to an exemplary embodiment. (Refer to...) Figure 12 The risk identification device 1200 provided in this embodiment may include: a module 1201 for obtaining the abnormal feedback text to be predicted, a module 1202 for obtaining the abnormal feedback vector to be predicted, a module 1203 for obtaining the latent semantic space, a similarity determination module 1204, a first risk feedback topic determination module 1205, and a first risk warning information generation module 1206.
[0163] The module 1201 for acquiring the anomaly feedback text to be predicted can be configured to acquire the anomaly feedback text to be predicted for the target object. The module 1202 for acquiring the anomaly feedback vector to be predicted can be configured to generate an anomaly feedback vector to be predicted based on the anomaly feedback text to be predicted. The module 1203 for acquiring the latent semantic space can be configured to acquire the latent semantic space generated based on anomaly feedback text samples, wherein the anomaly feedback text samples are determined from the anomaly feedback texts targeting the target risk object, and the latent semantic space includes anomaly feedback text sample vectors clustered according to N risk feedback topics, wherein the anomaly feedback text sample vectors correspond to the anomaly feedback text samples, and N is a positive integer greater than or equal to 1. The module 1204 for determining the similarity can be configured to obtain the similarity between the anomaly feedback text sample vectors clustered under each risk feedback topic and the anomaly feedback vector to be predicted. The module 1205 for determining the first risk feedback topic can be configured to determine the risk feedback topic of the anomaly feedback vector to be predicted among the N risk feedback topics based on the similarity. The module 1206 for generating the first risk warning information can be configured to generate risk warning information for the target object based on the risk feedback topic.
[0164] In some embodiments, the latent semantic space further includes abnormal feedback word vectors clustered according to the N risk feedback topics, the N risk feedback topics including a first risk feedback topic, and the abnormal feedback word vectors including multiple target abnormal feedback word vectors clustered under the first risk feedback topic; wherein, the risk identification device 1200 further includes: a vector similarity determination module, a topic similarity determination module, a second risk feedback topic determination module, and a second risk warning information generation module.
[0165] The vector similarity determination module can be configured to determine the similarity between multiple target abnormal feedback word vectors clustered under the first risk feedback topic and the abnormal feedback vector to be predicted; the topic similarity determination module can be configured to determine the similarity between the abnormal feedback vector to be predicted and the first risk feedback topic based on the similarity between the multiple target abnormal feedback word vectors clustered under the first risk feedback topic and the abnormal feedback vector to be predicted; the second risk feedback topic determination module can be configured to determine that if the similarity between the abnormal feedback vector to be predicted and the first risk feedback topic is greater than a first threshold, then the first risk feedback topic is the risk feedback topic of the abnormal feedback vector to be predicted; the second risk warning information generation module can be configured to generate risk warning information for the object to be predicted based on the first risk feedback topic.
[0166] In some embodiments, the latent semantic space acquisition module 1203 may include: an anomaly feedback text sample acquisition unit, an original word segmentation text matrix generation unit, a matrix decomposition unit, and a reconstruction unit.
[0167] The abnormal feedback text sample acquisition unit can be configured to determine the abnormal feedback text sample from the abnormal feedback text targeting the target risk object, wherein the abnormal feedback text sample is composed of abnormal feedback words; the original word segmentation text matrix generation unit can be configured to generate an original word segmentation text matrix based on the abnormal feedback text sample; the matrix decomposition unit can be configured to perform matrix decomposition on the original word segmentation text matrix to obtain the word segmentation topic matrix, text topic matrix, and feature value matrix of the abnormal feedback text sample, wherein the word segmentation topic matrix clusters the abnormal feedback word vectors corresponding to the abnormal feedback words according to M risk feedback topics, and the text topic matrix clusters the abnormal feedback text sample vectors corresponding to the abnormal feedback text sample through the M risk feedback topics, where M is a positive integer greater than or equal to N; the reconstruction unit can be configured to construct the latent semantic space based on the feature value matrix, the word segmentation topic matrix of the abnormal feedback text sample, and the text topic matrix.
[0168] In some embodiments, the reconstruction unit may include: a low-order feature value matrix acquisition subunit, a low-order word segmentation text matrix generation subunit, and a latent semantic space construction subunit.
[0169] The low-order feature value matrix acquisition subunit can be configured to set the feature values less than the second threshold in the feature value matrix to 0 to obtain the low-order feature value matrix; the low-order word segmentation text matrix generation subunit can be configured to determine the low-order word segmentation text matrix based on the low-order feature value matrix, the word segmentation topic matrix, and the text topic matrix; the latent semantic space construction subunit can be configured to construct the latent semantic space generated based on the abnormal feedback text sample based on the low-order word segmentation text matrix.
[0170] In some embodiments, the latent semantic space construction subunit includes: a loss value determination subunit, a second threshold adjustment subunit, and a low-order word segmentation text matrix update subunit.
[0171] The loss value determination subunit can be configured to determine the loss value between the original word segmentation text matrix and the low-order word segmentation text matrix; the second threshold adjustment subunit can be configured to adjust the second threshold based on the loss value between the original word segmentation text matrix and the low-order word segmentation text matrix; the low-order word segmentation text matrix update subunit can be configured to update the low-order word segmentation text matrix based on the adjusted second threshold, until the loss value between the original word segmentation text matrix and the low-order word segmentation text matrix meets a preset condition, and then the latent semantic space is constructed based on the low-order word segmentation text matrix that meets the preset condition.
[0172] In some embodiments, the N risk feedback topics include a second risk feedback topic. The similarity determination module 1204 may include: a target anomaly feedback text sample vector determination unit and a vector similarity determination unit.
[0173] The target anomaly feedback text sample vector determination unit can be configured to determine multiple target anomaly feedback text sample vectors clustered under the second risk feedback topic in the latent semantic space based on the text topic matrix; the vector similarity determination unit can be configured to determine the similarity between each target anomaly feedback text sample vector and the anomaly feedback vector to be predicted.
[0174] In some embodiments, the first risk feedback topic determination module may include: a topic similarity determination unit and a risk feedback topic determination unit.
[0175] The topic similarity determination unit can be configured to determine the similarity between the anomaly feedback vector to be predicted and the second risk feedback topic based on the similarity between each target anomaly feedback text sample vector and the anomaly feedback vector to be predicted; the risk feedback topic determination unit can be configured to determine that if the similarity between the anomaly feedback vector to be predicted and the second risk feedback topic is greater than a third threshold, then the second risk feedback topic is the risk feedback topic of the anomaly feedback vector to be predicted.
[0176] In some embodiments, the original word segmentation text matrix is generated from the abnormal feedback text sample vectors corresponding to multiple abnormal feedback text samples, the multiple abnormal feedback text samples including target abnormal feedback text samples; wherein, the original word segmentation text matrix generation unit may include: a target abnormal feedback vocabulary acquisition subunit, an inverse document frequency determination subunit, a word frequency determination subunit, and a target abnormal feedback text sample vector determination subunit.
[0177] The target anomaly feedback vocabulary acquisition subunit can be configured to determine a target anomaly feedback vocabulary based on the anomaly feedback text targeting the target risk object; the inverse document frequency determination subunit can be configured to determine the inverse document frequency of each word in the target anomaly feedback vocabulary in the plurality of anomaly feedback text samples; the word frequency determination subunit can be configured to determine the word frequency of each word in the target anomaly feedback vocabulary in the target anomaly feedback text samples; and the target anomaly feedback text sample vector determination subunit can be configured to determine the target anomaly feedback text sample vector of the target anomaly feedback text sample based on the word frequency of each word in the target anomaly feedback text sample and the inverse document frequency in the plurality of anomaly feedback text samples, so as to determine the original word segmentation text matrix based on the target anomaly feedback text sample vector.
[0178] Since the functions of the device 1200 have been described in detail in their respective method embodiments, they will not be repeated here.
[0179] Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein can be implemented by software or by combining software with necessary hardware. Therefore, the technical solutions of the embodiments of this disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (such as a CD-ROM, USB flash drive, external hard drive, etc.) and includes several instructions to cause a computing device (such as a personal computer, server, mobile terminal, or smart device, etc.) to execute the method according to the embodiments of this disclosure, for example... Figure 3 One or more of the steps shown.
[0180] Furthermore, the above figures are merely illustrative of the processes included in the method according to exemplary embodiments of this disclosure and are not intended to be limiting. It is readily understood that the processes shown in the above figures do not indicate or limit the temporal order of these processes. Additionally, it is readily understood that these processes may be executed synchronously or asynchronously, for example, in multiple modules.
[0181] Other embodiments of this disclosure will readily occur to those skilled in the art upon consideration of the specification and practice of the disclosure herein. This disclosure is intended to cover any variations, uses, or adaptations of this disclosure that follow the general principles of this disclosure and include common knowledge or customary techniques in the art not claimed herein. The specification and examples are to be considered exemplary only, and the true scope and spirit of this disclosure are indicated by the claims.
[0182] It should be understood that this disclosure is not limited to the detailed structures, drawing arrangements or implementations shown herein; rather, this disclosure is intended to cover various modifications and equivalent arrangements contained within the spirit and scope of the appended claims.
Claims
1. A risk identification method, characterized in that, include: Obtain user complaints against long-term rental service providers or platforms, and obtain the feedback text of the predicted anomalies for the objects to be predicted; Generate a predicted anomaly feedback vector based on the predicted anomaly feedback text; Multiple abnormal feedback text samples are identified from the abnormal feedback texts targeting the target risk object, and a target abnormal feedback vocabulary is determined based on the abnormal feedback texts targeting the target risk object; the multiple abnormal feedback text samples include the target abnormal feedback text samples; the target risk object includes long-term rental service merchants or long-term rental service platforms that have the target risk. Based on the importance of each word in the target anomaly feedback vocabulary in the target anomaly feedback text sample, the target anomaly feedback text sample vector of the target anomaly feedback text sample is determined, so as to determine the original word segmentation text matrix based on the target anomaly feedback text sample vector; The original segmented text matrix is subjected to matrix decomposition to obtain a segmented topic matrix for representing the correlation between words and risk feedback topics, a text topic matrix for representing the correlation between abnormal feedback text and risk feedback topics, and an eigenvalue matrix for representing the correlation between semantics and risk feedback topics. Based on the feature value matrix, the word segmentation topic matrix of the abnormal feedback text samples, and the text topic matrix, a latent semantic space is constructed including abnormal feedback text sample vectors clustered according to N risk feedback topics, where N is a positive integer greater than or equal to 1; Obtain the similarity between the abnormal feedback text sample vectors clustered under each risk feedback topic and the abnormal feedback vector to be predicted; Based on the similarity, the risk feedback topic for the predicted abnormal feedback vector is determined from the N risk feedback topics; Based on the risk feedback topic, generate risk warning information for the object to be predicted.
2. The method according to claim 1, characterized in that, The latent semantic space further includes anomalous feedback word vectors clustered according to the N risk feedback topics, wherein the N risk feedback topics include a first risk feedback topic, and the anomalous feedback word vectors include multiple target anomalous feedback word vectors clustered under the first risk feedback topic; wherein, the method further includes: Determine the similarity between the clustered target anomaly feedback word vectors under the first risk feedback topic and the anomaly feedback vector to be predicted; The similarity between the anomaly feedback vector to be predicted and the first risk feedback topic is determined based on the similarity between the clustered anomaly feedback word vectors under the first risk feedback topic and the anomaly feedback vector to be predicted. If the similarity between the anomaly feedback vector to be predicted and the first risk feedback topic is greater than a first threshold, then the first risk feedback topic is the risk feedback topic of the anomaly feedback vector to be predicted. Risk warning information for the object to be predicted is generated based on the first risk feedback topic.
3. The method according to claim 2, characterized in that, The original segmented text matrix is subjected to matrix decomposition to obtain a segmented topic matrix representing the correlation between words and risk feedback topics, a text topic matrix representing the correlation between abnormal feedback text and risk feedback topics, and an eigenvalue matrix representing the correlation between semantics and risk feedback topics, including: The original segmented text matrix is decomposed to obtain the segmented topic matrix, text topic matrix, and feature value matrix of the abnormal feedback text sample. The segmented topic matrix clusters the abnormal feedback word vectors corresponding to the abnormal feedback words according to M risk feedback topics. The text topic matrix clusters the abnormal feedback text sample vectors corresponding to the abnormal feedback text samples according to the M risk feedback topics. M is a positive integer greater than or equal to N.
4. The method according to claim 3, characterized in that, Based on the feature value matrix, the word segmentation topic matrix of the abnormal feedback text samples, and the text topic matrix, a latent semantic space is constructed, including abnormal feedback text sample vectors clustered according to N risk feedback topics, including: Set the eigenvalues in the eigenvalue matrix that are less than the second threshold to 0 to obtain a low-order eigenvalue matrix; The low-order word segmentation text matrix is determined based on the low-order feature value matrix, the word segmentation topic matrix, and the text topic matrix; The latent semantic space generated from the abnormal feedback text samples is constructed based on the low-order segmented text matrix.
5. The method according to claim 4, characterized in that, Constructing the latent semantic space based on the low-order segmented text matrix, generated from the anomaly feedback text samples, including: Determine the loss value between the original segmented text matrix and the low-order segmented text matrix; The second threshold is adjusted based on the loss value between the original segmented text matrix and the low-order segmented text matrix; The low-order segmented text matrix is updated according to the adjusted second threshold until the loss value between the original segmented text matrix and the low-order segmented text matrix meets a preset condition. Then, the latent semantic space is constructed based on the low-order segmented text matrix that meets the preset condition.
6. The method according to claim 3, characterized in that, The N risk feedback topics include a second risk feedback topic; wherein, obtaining the similarity between the clustered abnormal feedback text sample vectors under each risk feedback topic and the abnormal feedback vector to be predicted, and determining the risk feedback topic of the abnormal feedback vector to be predicted among the N risk feedback topics based on the similarity, includes: Based on the text topic matrix, determine multiple target anomaly feedback text sample vectors clustered under the second risk feedback topic in the latent semantic space; The similarity between each target anomaly feedback text sample vector and the anomaly feedback vector to be predicted is determined respectively; The similarity between the anomaly feedback vector to be predicted and the second risk feedback topic is determined based on the similarity between each target anomaly feedback text sample vector and the anomaly feedback vector to be predicted. If the similarity between the anomaly feedback vector to be predicted and the second risk feedback topic is greater than a third threshold, then the second risk feedback topic is the risk feedback topic of the anomaly feedback vector to be predicted.
7. The method according to claim 1, characterized in that, The original word segmentation text matrix is generated from the abnormal feedback text sample vectors corresponding to the plurality of abnormal feedback text samples; wherein, the target abnormal feedback text sample vector of the target abnormal feedback text sample is determined according to the importance of each word in the target abnormal feedback vocabulary in the target abnormal feedback text sample, including: Determine the inverse document frequency of each word in the target anomaly feedback vocabulary in the plurality of anomaly feedback text samples; Determine the word frequency of each word in the target anomaly feedback vocabulary in the target anomaly feedback text sample; The target anomaly feedback text sample vector of the target anomaly feedback text sample is determined based on the word frequency of each word in the target anomaly feedback text sample and the inverse document frequency in the plurality of anomaly feedback text samples, so as to determine the original word segmentation text matrix based on the target anomaly feedback text sample vector.
8. A risk identification device, characterized in that, include: The module for obtaining the predicted abnormal feedback text is configured to obtain user complaint information against long-term rental service providers or long-term rental service platforms, and obtain the predicted abnormal feedback text of the object to be predicted. The module for obtaining the predicted anomaly feedback vector is configured to generate a predicted anomaly feedback vector based on the predicted anomaly feedback text. The latent semantic space acquisition module is configured to determine multiple abnormal feedback text samples in the abnormal feedback texts targeting the target risk object, and to determine a target abnormal feedback vocabulary based on the abnormal feedback texts targeting the target risk object; the multiple abnormal feedback text samples include the target abnormal feedback text samples; the target risk object includes long-term rental service merchants or long-term rental service platforms that have the target risk. Based on the importance of each word in the target anomaly feedback vocabulary in the target anomaly feedback text sample, a target anomaly feedback text sample vector is determined for the target anomaly feedback text sample. This vector is then used to determine the original word segmentation text matrix. The original word segmentation text matrix is then decomposed to obtain a word segmentation topic matrix representing the correlation between words and risk feedback topics, a text topic matrix representing the correlation between anomaly feedback text and risk feedback topics, and an eigenvalue matrix representing the correlation between semantics and risk feedback topics. Based on the eigenvalue matrix, the word segmentation topic matrix of the anomaly feedback text sample, and the text topic matrix, a latent semantic space is constructed, comprising anomaly feedback text sample vectors clustered according to N risk feedback topics, where N is a positive integer greater than or equal to 1. The similarity determination module is configured to obtain the similarity between the abnormal feedback text sample vectors clustered under each risk feedback topic and the abnormal feedback vector to be predicted. The first risk feedback topic determination module is configured to determine the risk feedback topic of the to-be-predicted abnormal feedback vector from the N risk feedback topics based on the similarity. The first risk warning information generation module is configured to generate risk warning information for the object to be predicted based on the risk feedback topic.
9. An electronic device, characterized in that, include: Memory; as well as A processor coupled to the memory, the processor being configured to execute the risk identification method as described in any one of claims 1-7 based on instructions stored in the memory.
10. A computer-readable storage medium having a program stored thereon that, when executed by a processor, implements the risk identification method as described in any one of claims 1-7.
11. A computer program product comprising computer instructions that, when executed by a processor of a computer device, cause the computer device to implement the risk identification method as described in any one of claims 1-7.