CN104484367A - Data mining and analyzing system - Google Patents

Data mining and analyzing system Download PDF

Info

Publication number
CN104484367A
CN104484367A CN201410736242.7A CN201410736242A CN104484367A CN 104484367 A CN104484367 A CN 104484367A CN 201410736242 A CN201410736242 A CN 201410736242A CN 104484367 A CN104484367 A CN 104484367A
Authority
CN
China
Prior art keywords
module
user
data
query
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410736242.7A
Other languages
Chinese (zh)
Inventor
鲁银刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGZHOU MERCHANTS QUICKLY BUILT INTERNET INFORMATION TECHNOLOGY Co Ltd
Original Assignee
GUANGZHOU MERCHANTS QUICKLY BUILT INTERNET INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GUANGZHOU MERCHANTS QUICKLY BUILT INTERNET INFORMATION TECHNOLOGY Co Ltd filed Critical GUANGZHOU MERCHANTS QUICKLY BUILT INTERNET INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410736242.7A priority Critical patent/CN104484367A/en
Publication of CN104484367A publication Critical patent/CN104484367A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Abstract

The invention provides a data mining and analyzing system which comprises an input and output module, an interest information storage module, an inquiry analyzing module, a Web processing module, a result preprocessing module and an inquiry filtering module. The input and output module provides inquiry input and result output for users, the interest information storage module is used for storing interest data information of the users, the inquiry analyzing module analyzes according to inquiry requests of the users to form new inquiry requests, the Web processing module calls multiple webpage data in a parallel manner, the result preprocessing module integrates data information of the Web processing module and then sends the same to the inquiry filtering module, and the inquiry filtering module performs relevancy ranking on the data information in the result preprocessing module according to the data information in the interest information storage module and outputs inquiry results to the users through the input and output module. By the data mining and analyzing system, returned search results are analyzed and processed, and then targeted search results are returned to the users, so that retrieval efficiency is improved.

Description

A kind of data mining analysis system
Technical field
The present invention relates to Data Mining, be specifically related to a kind of data mining analysis system.
Background technology
Along with network information explosive growth, people are not very little by analyzing the information retrieved, but too many, and great majority are all the information irrelevant with inquiry request.Traditional analysis and general meta analysis system more and more can not meet the demand of people, and thus data mining technology becomes the hot issue of searching field research day by day.Data mining generally refers to the process being hidden in wherein information from a large amount of data by algorithm search.Data mining is usually relevant with computer science, and realizes above-mentioned target by all multi-methods such as statistics, Data Environments, information retrieval, machine learning, expert system (relying on thumb rule in the past) and pattern-recognitions.But prior art can not return effective Search Results in time according to the search keyword of user's input.User is to after the returning results and carry out satisfaction evaluation of data mining analysis, and existing system can not carry out study analysis to the satisfaction feedback information of user, and Search Results specific aim is poor.In addition, existing system structural model is unfavorable for the security and the guarantee uniformity for the treatment of that ensure back-end data.Therefore, the shortcoming existed in prior art, is necessary to make improvements prior art.
Summary of the invention
The object of the invention is to overcome shortcoming of the prior art with not enough, a kind of data mining analysis system that can return specific aim Search Results to user is provided.
The present invention is realized by following technical scheme:
A kind of data mining analysis system, comprising:
Input/output module, provides visual inquiry to input for user and result exports;
Interest information memory module, for depositing user interest data message;
Query analysis module, the data message according to interest information memory module is analyzed user's inquiry request, and carries out expanding to query statement and form new longer, inquiry request more accurately;
Web processing module, calls multiple web data, to obtain required web data and web data is sent to result pretreatment module by parallel mode;
Result pretreatment module, sends to query filter module after carrying out integration process to the data message of Web processing module;
Query filter module, carries out relevancy ranking according to the data message in interest information memory module to the data message in result pretreatment module, and Query Result is exported to user by input/output module.
User interest data message in described interest information memory module is the information extraction in user's accessed web page historical record.
Described result output is a linear lists of documents.
Described query filter module comprises receiving processing module and data analysis module, and described receiving processing module receives the index file that user's inquiry request obtains, and to be analyzed and provide Query Result by data analysis module to described index file; Described data analysis module obtains new query statement according to the analysis of user interest data message, obtains required target index file according to new query statement in described index file.
Described query analysis module analysis user behavior obtains user interest data message.
Described user behavior comprises locality and the user's clicking rate that user browses the selectivity of webpage, user browses webpage.
Described user's clicking rate comprises the accessed number of times of the page or the searched number of times of the page.
Described data mining analysis system also comprises satisfaction evaluation module, described satisfaction evaluation module returns to interest information memory module according to the satisfaction information of user to Query Result, carries out relevancy ranking for described query filter module to the data message in result pretreatment module.
Described data mining analysis system has three-decker, comprises presentation layer, Business Logic and Data Persistence Layer.
Relative to prior art, the present invention can return Search Results in time according to the search keyword of user's input, and can carry out study analysis according to user to the feedback information of Search Results, returns Search Results targetedly to user, realize data mining analysis, improve data mining analysis efficiency.The object of data mining analysis is according to the background of user, hobby, research direction, retrieval object etc., to provide corresponding demand information to user.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is only some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic diagram of data mining analysis system of the present invention;
Fig. 2 is the query analysis module principle figure of data mining analysis system of the present invention;
Fig. 3 is the three-decker schematic diagram of data mining analysis system of the present invention;
Fig. 4 is the meta analysis schematic diagram of data mining analysis system of the present invention.
In figure:
1. input/output module; 2. interest information memory module; 3. query analysis module; 4.Web processing module; 5. result pretreatment module; 6. query filter module; 7. receiving processing module; 8. data analysis module; 9. index file; 10. target index file; 11. knowledge bases; 12. result treatment modules; 13. presentation layers; 14. Business Logics; 15. Data Persistence Layers.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
A data mining analysis system as shown in Figures 1 to 4, comprising:
Input/output module 1, provides visual inquiry to input for user and result exports; In inquiry input, user can input a series of keyword, a series of boolean operators etc., and result output is a linear lists of documents.
Interest information memory module 2, for depositing user interest data message; User interest data message in interest information memory module 2 is the information extraction in user's accessed web page historical record.Interesting data information not only requires objective, comprehensive representation user interest data knowledge, but also good later stage interest assessment operability will be possessed.
Query analysis module 3, the data message according to interest information memory module 2 is analyzed user's inquiry request, and carries out expanding to query statement and form new longer, inquiry request more accurately; Inquiry request is reasonably set and can reduces invalid content in Search Results greatly, improve search efficiency.This query analysis module 3 is analyzed user behavior and is obtained user interest data message.User behavior comprises locality and the user's clicking rate that user browses the selectivity of webpage, user browses webpage.User browses the selectivity of webpage, when user searches at every turn, analyze and all can return hundreds and thousands of Query Results, if user clicks a Query Result, just can think that user looks this Query Result quality higher, be clicked the page browsed by user and thought by user the page that quality is higher.User browses the locality of webpage, the URL that user clicks is quite concentrated, major part user clicks and drops on several pages above, user's clicking rate of first page accounts for 47% of total click, and the clicking rate of 5 pages accounts for more than 75% of total click above, number of clicks less than the page of total amount 1/3 accounts for 2/3 of total number of clicks, and this shows that user clicks URL and has very strong locality.User's clicking rate, the time existed due to webpage is longer, and the access times adding up to get off may be more, therefore the accessed number of times of webpage can not reflect the quality of a web page contents well.So, user's clicking rate of webpage should be used to reflect the quality of the page.User's clicking rate comprises the accessed number of times of the page or the searched number of times of the page.Although it is all the click under certain query term that each user clicks, result of study shows, under most query term, and the click frequency of URL and basically identical in the click frequency of all query term URL.Therefore, just need not consider that this number of clicks is the number of clicks under what project when calculating user's clicking rate.
Web processing module 4, calls multiple web data, to obtain required web data and web data is sent to result pretreatment module 5 by parallel mode;
Result pretreatment module 5, sends to query filter module 3 after carrying out integration process to the data message of Web processing module 4; Result from different web pages data analysis is integrated, rejects repetition, consolidation form, inspection link validity and classification etc.
Query filter module 6, carries out relevancy ranking according to the data message in interest information memory module 2 to the data message in result pretreatment module 5, and Query Result is exported to user by input/output module 1.This query filter module 6 comprises receiving processing module 7 and data analysis module 8, and this receiving processing module 7 receives the index file 9 that user's inquiry request obtains, and is analyzed by data analysis module 8 and provides Query Result described index file 9; This data analysis module 8 obtains new query statement according to the analysis of user interest data message, obtains required target index file 10 according in new query statement indexed file 9.
Data mining analysis system also comprises satisfaction evaluation module, this satisfaction evaluation module returns to interest information memory module 2 according to the satisfaction information of user to Query Result, carries out relevancy ranking for query filter module 6 to the data message in result pretreatment module 5.User is the direct user of analysis, is also the final judge of service quality quality.Use the investigation of analytical behavior to be that analysis optimization particularly needs to user, and analyze as user's information of looking for provides guide.Hugely simultaneously also expose many problems easily owing to analyzing to bring to the network user, to address these problems in time, analysis being optimized, so then needing a large amount of user profile.And the satisfaction provided during customer analysis and unsatisfied evaluation, a large amount of user profile can be obtained.
Data mining analysis system has three-decker, comprises presentation layer 13, Business Logic 14 and Data Persistence Layer 15.Three-decker can ensure that user accesses and directly not contact background application and data resource, but by access middle layer, obtains the data resource on backstage, so namely can ensure the security of back-end data, can ensure uniformity for the treatment of again.
Data mining analysis refers to the historical record analyzed according to user search, returns the Search Results being more suitable for this user.These search history records comprise the keyword that user searches for, the click situation in Search Results, in the access situation of each website, and bookmark situation etc.Analysis is analyzed after having grasped these subscriber datas, when the keyword that user search is new, can return Search Results more targetedly, thus improves Consumer's Experience.And analyze, it is exactly collection, discovery information in internet with certain technology and strategy, and understands information, extracts and process, and is the service that user provides Web to search for.
Meta analysis regards existing multiple analysis as an entirety, for user provides a unified query interface, the inquiry request of user by meta analysis according to the information in knowledge base 11, be converted to the form that multiple analysis can identify, then each independent analysis called is sent to respectively, actual information retrieval has been analyzed by these, last meta analysis is again by collection that result treatment module 10 returns each analysis, compare analysis, eliminate redundancy information, returns to user with certain form.Meta analysis refers under unified user's query interface and information feed back form, and the knowledge base 11 sharing multiple analysis provides the system of information service for user.
The search keyword that the present invention inputs according to user, returns Search Results in time, collects user search interesting data information simultaneously, returns have more Search Results targetedly in search afterwards to user.User to analyze return results and carry out satisfaction evaluation after, the present invention can carry out study analysis to the satisfaction feedback information of user, improves recall precision.The present invention, according to user interest data message Optimizing Search result, preferentially returns the interested web page contents of user.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, within the spirit and principles in the present invention all, any amendment done, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (9)

1. a data mining analysis system, is characterized in that, comprising:
Input/output module, provides visual inquiry to input for user and result exports;
Interest information memory module, for depositing user interest data message;
Query analysis module, the data message according to interest information memory module is analyzed user's inquiry request, and carries out expanding to query statement and form new longer, inquiry request more accurately;
Web processing module, calls multiple web data, to obtain required web data and web data is sent to result pretreatment module by parallel mode;
Result pretreatment module, sends to query filter module after carrying out integration process to the data message of Web processing module;
Query filter module, carries out relevancy ranking according to the data message in interest information memory module to the data message in result pretreatment module, and Query Result is exported to user by input/output module.
2. data mining analysis system according to claim 1, is characterized in that: the user interest data message in described interest information memory module is the information extraction in user's accessed web page historical record.
3. data mining analysis system according to claim 1, is characterized in that: described result output is a linear lists of documents.
4. data mining analysis system according to claim 1, it is characterized in that: described query filter module comprises receiving processing module and data analysis module, described receiving processing module receives the index file that user's inquiry request obtains, and to be analyzed and provide Query Result by data analysis module to described index file; Described data analysis module obtains new query statement according to the analysis of user interest data message, obtains required target index file according to new query statement in described index file.
5. data mining analysis system according to claim 1, is characterized in that: described query analysis module analysis user behavior obtains user interest data message.
6. data mining analysis system according to claim 5, is characterized in that: described user behavior comprises locality and the user's clicking rate that user browses the selectivity of webpage, user browses webpage.
7. data mining analysis system according to claim 6, is characterized in that: described user's clicking rate comprises the accessed number of times of the page or the searched number of times of the page.
8. data mining analysis system according to claim 1, it is characterized in that: described data mining analysis system also comprises satisfaction evaluation module, described satisfaction evaluation module returns to interest information memory module according to the satisfaction information of user to Query Result, carries out relevancy ranking for described query filter module to the data message in result pretreatment module.
9. data mining analysis system according to claim 1, is characterized in that: described data mining analysis system has three-decker, comprises presentation layer, Business Logic and Data Persistence Layer.
CN201410736242.7A 2014-12-05 2014-12-05 Data mining and analyzing system Pending CN104484367A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410736242.7A CN104484367A (en) 2014-12-05 2014-12-05 Data mining and analyzing system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410736242.7A CN104484367A (en) 2014-12-05 2014-12-05 Data mining and analyzing system

Publications (1)

Publication Number Publication Date
CN104484367A true CN104484367A (en) 2015-04-01

Family

ID=52758908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410736242.7A Pending CN104484367A (en) 2014-12-05 2014-12-05 Data mining and analyzing system

Country Status (1)

Country Link
CN (1) CN104484367A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294847A (en) * 2016-08-22 2017-01-04 成都天地网络科技有限公司 Business operation system based on data mining
CN107103365A (en) * 2017-04-12 2017-08-29 邹霞 The perspective analysis method of machine learning model
CN109214357A (en) * 2018-09-30 2019-01-15 赵学义 A kind of method and electronic equipment carrying out data mining based on face recognition algorithms
CN112783294A (en) * 2021-02-15 2021-05-11 北京泽桥传媒科技股份有限公司 User retention data analysis method and device for mobile Internet system
CN113392304A (en) * 2020-03-11 2021-09-14 淄博职业学院 Big data storage service method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144162A1 (en) * 2003-12-29 2005-06-30 Ping Liang Advanced search, file system, and intelligent assistant agent
CN101075239A (en) * 2006-08-23 2007-11-21 腾讯科技(深圳)有限公司 Composite searching method and system
CN101127043A (en) * 2007-08-03 2008-02-20 哈尔滨工程大学 Lightweight individualized search engine and its searching method
CN102033955A (en) * 2010-12-24 2011-04-27 常华 Method for expanding user search results and server

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050144162A1 (en) * 2003-12-29 2005-06-30 Ping Liang Advanced search, file system, and intelligent assistant agent
CN101075239A (en) * 2006-08-23 2007-11-21 腾讯科技(深圳)有限公司 Composite searching method and system
CN101127043A (en) * 2007-08-03 2008-02-20 哈尔滨工程大学 Lightweight individualized search engine and its searching method
CN102033955A (en) * 2010-12-24 2011-04-27 常华 Method for expanding user search results and server

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294847A (en) * 2016-08-22 2017-01-04 成都天地网络科技有限公司 Business operation system based on data mining
CN107103365A (en) * 2017-04-12 2017-08-29 邹霞 The perspective analysis method of machine learning model
CN109214357A (en) * 2018-09-30 2019-01-15 赵学义 A kind of method and electronic equipment carrying out data mining based on face recognition algorithms
CN113392304A (en) * 2020-03-11 2021-09-14 淄博职业学院 Big data storage service method
CN113392304B (en) * 2020-03-11 2023-05-12 淄博职业学院 Big data storage service method
CN112783294A (en) * 2021-02-15 2021-05-11 北京泽桥传媒科技股份有限公司 User retention data analysis method and device for mobile Internet system

Similar Documents

Publication Publication Date Title
US11663254B2 (en) System and engine for seeded clustering of news events
US10546006B2 (en) Method and system for hybrid information query
KR101463974B1 (en) Big data analysis system for marketing and method thereof
US20170364834A1 (en) Real-time monitoring of public sentiment
CN105022827A (en) Field subject-oriented Web news dynamic aggregation method
CN101727454A (en) Method for automatic classification of objects and system
CN104077286A (en) Commodity information search method and system
CN104484367A (en) Data mining and analyzing system
CN102737021A (en) Search engine and realization method thereof
CN102722499A (en) Search engine and implementation method thereof
US10127617B2 (en) System for analyzing social media data and method of analyzing social media data using the same
Vijiyarani et al. Research issues in web mining
CA2956627A1 (en) System and engine for seeded clustering of news events
Dias et al. Automating the extraction of static content and dynamic behaviour from e-commerce websites
CN102955802A (en) Method and device for acquiring data from data reports
CN110188291B (en) Document processing based on proxy log
Romero-Frías Googling companies-a webometric approach to business studies
CN111723273A (en) Smart cloud retrieval system and method
Wang et al. Crawling ranked deep web data sources
Bhujbal et al. News aggregation using web scraping news portals
Ma et al. API prober–a tool for analyzing web API features and clustering web APIs
KR102041915B1 (en) Database module using artificial intelligence, economic data providing system and method using the same
KR20210037488A (en) Big Data Analytics-Based Advertising Marketing System
Maheswari et al. Algorithm for Tracing Visitors' On-Line Behaviors for Effective Web Usage Mining
CN105912584B (en) Data indexing system based on webpage information data

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150401

RJ01 Rejection of invention patent application after publication