CN102890710A - Excel-based data mining method - Google Patents

Excel-based data mining method Download PDF

Info

Publication number
CN102890710A
CN102890710A CN2012103373156A CN201210337315A CN102890710A CN 102890710 A CN102890710 A CN 102890710A CN 2012103373156 A CN2012103373156 A CN 2012103373156A CN 201210337315 A CN201210337315 A CN 201210337315A CN 102890710 A CN102890710 A CN 102890710A
Authority
CN
China
Prior art keywords
data
excel
data mining
model
mining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012103373156A
Other languages
Chinese (zh)
Inventor
何健明
刘世清
汤湛成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
PCI Suntek Technology Co Ltd
Original Assignee
PCI Suntek Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by PCI Suntek Technology Co Ltd filed Critical PCI Suntek Technology Co Ltd
Priority to CN2012103373156A priority Critical patent/CN102890710A/en
Publication of CN102890710A publication Critical patent/CN102890710A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an excel-based data mining method. Data mining is realized by virtue of an excel tool and a data mining external program as well as a database. By mining operating data or production data of an enterprise for a plurality of years, the modes can be applied to improve an operating strategy according to the useful mode found out by a data mining algorithm, so that the labor productivity is improved, the cost is reduced, and the profit of the enterprise is increased. General data mining can be achieved by professional personnel and a lot of money, and common medium-sized and small enterprises cannot bear the burden, therefore, the data mining achieved by the excel is elaborated in the text and most people who understand the excel can conduct the data mining.

Description

A kind of a kind of data digging method based on excel
Technical field
The present invention relates to Data Mining, particularly relate to a kind of data mining based on excel, can make data mining simplification, practical.
Background technology
Along with data mining is increasing in the impact of academia and industry member, in recent years, data mining is rapid in research and application facet development, and especially the application at commercial and the bank field is also faster than the speed of development of research.
The domestic personnel that are engaged in data mining research also have part in research institute or company mainly in university.Related research field is a lot, generally concentrates on the research of learning algorithm, the practical application of data mining and the research that relevant data excavates theoretical side.Most of research projects of carrying out are at present undertaken by government-funded, such as state natural sciences fund, 863 Program, " 95 " plan etc., but also less than the report about domestic data mining product.Enumerated five gordian techniquies that will produce material impact in 3~5 years from now on to industry in a nearest Gartner report, wherein data mining and artificial intelligence rank the first.Simultaneously, this part report is listed concurrent computer architecture research and data mining in company should invest in 5 years from now on 10 new technical field.
Can find out that the research of data mining and application have been subject to academia and business circles more and more pay attention to.But the user of simultaneously data mining mainly concentrates on the large enterprises such as large bank, insurance company, telecommunications company and merchandising business.Because Data Mining Project need to spend a large amount of manpower and financial resources, and the user also needs to have the statistics foundation and is familiar with relational database technology, because these reasons are hung back a lot of medium-sized and small enterprises.
In order to make medium-sized and small enterprises also can use the huge advantage that data mining technology is brought, need decrease cost and use threshold.This method can realize with fewer expense simple data mining, satisfies the active demand of medium-sized and small enterprises, and the user only need understand that the basic usage of excel is just passable.
Summary of the invention
Technical matters to be solved by this invention provides a kind of data mining based on excel, excavates the complicacy of project and builds the great expense that Data Mining Project was spent to reduce available data.
For achieving the above object, the invention provides a kind of data digging method based on excel, comprise the excel data acquisition, the pre-service of excel data, the training of excel data mining model, the assessment of mining model, the output of form;
Described data acquisition is used for: collect the management data of the accumulation of enterprise, and marketing data, production management data etc. import to the excel electrical form through behind the manual sort, choose the data of required analysis and it is carried out tabular;
Described data pre-service is used for: remove the data do not meet form, the row that assigning null data is many does not meet the data of rule, and the abnormal data such as extreme value data.
The training of described data mining model is used for: the field that selection need to be analyzed, select corresponding data mining algorithm, and configure the needed parameter of algorithm, executing arithmetic draws corresponding pattern to the model training.
The assessment of described mining model is used for: the mining model that trains is assessed, found that model is not ideal enough, can be by adjusting mining algorithm, perhaps the parameter of adjustment algorithm is perhaps adjusted the source data of training usefulness, by adjustment and the final model of selection the final shaping repeatedly.
The present invention also provides a kind of utilization based on the excel data mining model, comprising:
Data mining model can apply to a plurality of scenes, finds out target customers, at first:
Collect client's demographic/firmographic data, the historical data of management functions and other related data, import the excel table;
Secondly: the data of collecting are carried out pre-service;
Then: the data mining model that uses us to preserve, all these data are carried out computing, model can computing be exported the target customer which client is us (most probable is bought the client of our product), and each client may buy the probability of our product, we can carry out rank to the client according to probability, then find out the forward client of rank and market (accurate marketing), so namely can reduce the cost that marketing brings, can improve again the success ratio of marketing.
Can be found out that by such scheme the data digging method among the present invention can be realized data mining by excel.Modeling comparison is easy to learn, uses mining model also very convenient, does not need enterprise to remove to develop the Data Mining Project of a complexity, does not also need very professional personnel to operate.This method can be simply, fast, at a low price realize data mining.Safeguard also easily, can save cost to medium-sized and small enterprises, improve rate of return.
Description of drawings
In order to be illustrated more clearly in the embodiment of the invention or technical scheme of the prior art, the below will do simple the introduction to the accompanying drawing of required use in embodiment or the description of the Prior Art, apparently, accompanying drawing in the following describes only is some embodiments of the present invention, for those of ordinary skills, under the prerequisite of not paying creative work, can also obtain according to these accompanying drawings other accompanying drawing.
Fig. 1 is the process flow diagram of the better excel data mining of the embodiment of the invention one;
Fig. 2 is model evaluation figure in the embodiment of the invention one excel data mining;
Fig. 3 is the schematic diagram that the embodiment of the invention one excel realizes data mining;
Fig. 4 is model generation and utilization procedure chart in the embodiment of the invention one excel data mining.
Embodiment
For above-mentioned purpose of the present invention, feature and advantage can be become apparent more, the present invention is further detailed explanation below in conjunction with the drawings and specific embodiments.Obviously, described embodiment only is the present invention's part embodiment, rather than whole embodiment.Based on the embodiment among the present invention, those of ordinary skills belong to the scope of protection of the invention not making the every other embodiment that obtains under the creative work prerequisite.
The principle of embodiment one is participated in Fig. 3.
Excel carries out data mining by the data mining engine that SQLSERVER2008 data mining Add-In calls the SQLSERVER2008 lane database, outputs results to excel.
The excel data mining need to be installed excel 2007, if earlier version needs unloading, then excel 2007 versions is installed; Need to install excel data mining plug-in unit after installing, be used for the connection data storehouse; Can install on other the data that sqlserver2008 is installed of sqlserver 2008 databases or connection in the machine.
Embodiment one realization flow is referring to Fig. 1.
The invention provides a kind of data digging method based on excel, described method comprises the training of excel model and the utilization of excel model:
The training of described excel model comprises, the processes such as data acquisition, data pre-service, model calculation, report output;
The utilization of described excel model comprises, the assessment of model and the practice of model.
Fig. 1 is the schematic flow sheet of a kind of implementation method of better excel data mining in the embodiment of the invention, referring to Fig. 1:
Step 101, data acquisition.
In this step, can come according to the needs of practice the image data object, also can be from alternative document translation data such as txt files to excel, perhaps from the database derived data to excel, excel supports these operations.Then choose the data that need modeling, these data are carried out tabular, finish to this data acquisition work.
Step 102, pretreatment.
In this step, the data such as exceptional value, missing values, extreme value may appear in the data that collect, so need these data are processed before carrying out data mining.At first by the browsing data function in the excel, choose the table data, click the browsing data button, then select data rows to browse, can look into and see if there is missing values, can manually fill if having, can also check the distribution situation of this column data, whether see compound logic, do not meet very much logic if distribute, can remove this column data, not participate in modeling, otherwise can affect the accuracy of model.According to such step every column data is processed.Can process outlier by the button that clears data of excel, perhaps go to outlier.The click button that clears data, select the data area, select row, then set the threshold values of outlier, then can select the processing mode to outlier, mode has: value is changed to specified scope, value is changed to mean value, value is changed to null (empty data), deletes the row that comprises outlier.Can select as required, I change value into mean value by recommendations for selection.
Step 103, Fig. 4 is participated in the training of data mining model.
In this step, through data acquisition and the data pre-service of preceding step, to the requirement of the substantially compound training pattern of these these data, select corresponding data algorithm classification according to the scene of using, as carrying out the prediction of potential user group, we are with regard to the various algorithms in the selection sort.Such as the analysis of the customer group that will run off, we just select the various algorithms in the cluster analysis.Want relevance between the analytic product such as us, we just select the various algorithms in the association.Here we illustrate with prediction potential user group, and at first our click classifications button is selected our pretreated data, then select the row that will analyze, then select the input row, can comprise multiple row.After the row of analyzing and input row are all chosen, need to select parameter, parameter comprises the selection of algorithm, and the parameter configuration of each algorithm.The algorithm here has Microsoft decision Tree algorithms, Micosoft
Figure BSA00000777078500041
Bayes algorithm, logistic regression algorithm, neural network algorithm.The book algorithm of wherein making a strategic decision comprises
HIDDEN_NODE_RATIO: specify the numeral that is used for determining the nodes in the hidden layer.Algorithm adopts the nodes in the following formula calculating hidden layer: HIDDEN_NODE_RATIO*sqrt ({ input node number } * { output node number });
HOLDOUT_PERCENTAGE: specify to be used for calculating this algorithm and to keep example number percent in the wrong typed data.HOLDOUT_PERCENTAGE is used as the part of stop condition during the typing mining model.This value is unique for this algorithm, with any cache oblivious of keeping that arranges in mining structure.Default value is 30;
HOLDOUT_SEED: specify at random determine this algorithm keep data the time as the number of pseudo-random generator seed.If HOLDOUT_SEED is set to 0, then algorithm will generate seed based on the mining model title, and this can guarantee that model content remains unchanged when again processing.This value is unique for this algorithm, with any cache oblivious of keeping that arranges in mining structure.Default value is 0;
MAXIMUM_INPUT_ATTRIBUTES: assignment algorithm is manageable maximum input attributes number before calling function is selected.If this value is set to 0, then select for the input attributes disable function;
MAXIMUM_OUTPUT_ATTRIBUTES: assignment algorithm is manageable maximum output attribute number before calling function is selected.If this value is set to 0, then select for the output attribute disable function;
MAXIMUM_STATUS: the maximum attribute status number that assignment algorithm is supported.If the status number of attribute is greater than this maximum rating number, algorithm will use the most common state of this attribute, and residual state is considered as not existing;
SAMPLE_SIZE: specify the example number that is used for to the model typing.Algorithm will be selected less that and be worth to use from the value of the number of SAMPLE_SIZE appointment or total_cases* (1-HOLDOUT_PERCENTAGE/100);
I advise not revising the parameter of algorithm in the domestic consumer that not quite understands these algorithms, keep acquiescence just passable, as long as algorithm is selected just passable.After choosing algorithm, want input model to carry out the number percent of the data of computing with regard to selecting, to carry out verification model because will stay a certain proportion of data, I advise selecting 30% data as input.Then finish, algorithm can carry out computing to the data of input.Draw a kind of pattern that trains.
Step 104, the assessment of mining model
In this step, model (namely pattern) to front training and preservation is assessed, see whether the pattern that trains satisfies our requirement, if do not meet the demands, we will turn back to the step of front, see whether the data pre-service is good not, the algorithm of selecting is not suitable for etc., we will repeat this process, form a closure process, just finish until training model out meets the demands.
Click accuracy figure table button, need to select the model of assessment, the excavation row that selection will be predicted, the excavation value of predicting, selection comes the test data of self model, the perhaps data in the form, point is finished and can be exported a secondary accuracy chart, figure can show the model-free state, and ideal model state, the state that you train model compare, and see whether your model meets the demands.Such as Fig. 2, the curve of the model of training out when you more just illustrates that near ideal model the model that you train is better, if the curve of the model that you train out under the model-free state illustrates that then the model that you train has problem, need to return the step of front and again train.Can also come assessment models by classification matrix, profit graph, the cross validation of excel.
Step 105, the output of form
In this step, all can export some forms in the processes such as the training of model, assessment, some forms are wherein analyzed professional helpful to us.We can make with reference to the result of these forms corresponding decision-making.
Step 106, the utilization of data mining model.
In this step, the model of process assessment can satisfy our service needed, and we can predict with these models.Now to find out potential user group explanation, those column datas that relevant information, the name such as the client, age, educational background, the amount of consumption of at first collecting the client etc. is used for training pattern carry out pre-service to data, the model of then selecting us to preserve, computing.Export whether our targeted customer of each user through the computing meeting of model, and probability.We can according to our needs, sort for the client from big to small according to probability.Then we select the larger user of likelihood ratio to market.Can improve like this hit rate of marketing, reduce the cost of marketing, improve client's satisfaction.
Therefore the data digging method of a kind of excel provided by the invention has following advantage.
(1) is convenient to dispose, reduces cost
The present invention only need install excel2007, and excel data mining plug-in unit is installed the sqlserver2008 database, just not be used in the machine and has installed if can be connected to the sqlserver2008 database of other machines.
(2) applied range
Because it is low to dispose implementation cost, operation does not need the technology of very professional data mining or mathematical statistics aspect yet, only need to understand that the operation of excel is just passable.All most medium-sized and small enterprises can use.
The above only is the specific embodiment of the present invention; should be pointed out that for those skilled in the art, under the prerequisite that does not break away from the principle of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (2)

1. the data digging method of an excel is characterized in that, comprises the excel data acquisition, data pre-service, the training of data mining model, the assessment of mining model, the output of excavating form;
Described excel data acquisition is used for: imports the business datum of enterprise operation, business datum carried out data scrubbing, and the analysis of key influence factor, detect classification;
Described data pre-service is used for: the data that import are cleared up;
The training of described data mining model is used for: after the excel image data, use suitable data mining algorithm, data are carried out computing, draw corresponding pattern;
The assessment of described mining model is used for: the mining model that trains is assessed, selected optimum model;
The output of described excavation form is used for: data are exported corresponding analysis result form through the training of data mining algorithm, are convenient to analysis result.
2. the data digging method of an excel is characterized in that, comprises the utilization of data mining model: predict potential target customers, and the prediction customer revenue, the relation of analysis cost and profit is found out and is generated profit the cost input of talking about most.
CN2012103373156A 2012-09-08 2012-09-08 Excel-based data mining method Pending CN102890710A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012103373156A CN102890710A (en) 2012-09-08 2012-09-08 Excel-based data mining method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012103373156A CN102890710A (en) 2012-09-08 2012-09-08 Excel-based data mining method

Publications (1)

Publication Number Publication Date
CN102890710A true CN102890710A (en) 2013-01-23

Family

ID=47534212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012103373156A Pending CN102890710A (en) 2012-09-08 2012-09-08 Excel-based data mining method

Country Status (1)

Country Link
CN (1) CN102890710A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121780A (en) * 2017-12-15 2018-06-05 中盈优创资讯科技有限公司 Data Analysis Model determines method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970482A (en) * 1996-02-12 1999-10-19 Datamind Corporation System for data mining using neuroagents
CN102508860A (en) * 2011-09-29 2012-06-20 广州中浩控制技术有限公司 Data mining method based on XBRL (extensible business reporting language) embodiment document

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970482A (en) * 1996-02-12 1999-10-19 Datamind Corporation System for data mining using neuroagents
CN102508860A (en) * 2011-09-29 2012-06-20 广州中浩控制技术有限公司 Data mining method based on XBRL (extensible business reporting language) embodiment document

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王冬磊: "数据挖掘技术在化工工艺优化中的应用研究", 《中国优秀博硕士学位论文全文数据库 (硕士) 信息科技辑》, no. 02, 15 February 2007 (2007-02-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108121780A (en) * 2017-12-15 2018-06-05 中盈优创资讯科技有限公司 Data Analysis Model determines method and device

Similar Documents

Publication Publication Date Title
Hofmann et al. Big data analytics and demand forecasting in supply chains: a conceptual analysis
Lee et al. Predictive analytics in business analytics: decision tree
You et al. A decision-making framework for precision marketing
US20110112986A1 (en) Generative Investment Method and System
CN103336791A (en) Hadoop-based fast rough set attribute reduction method
Mikavicaa et al. Big data: challenges and opportunities in logistics systems
CN104346698A (en) Catering member big data analysis and checking system based on cloud computing and data mining
CN105809277A (en) Big data based prediction method for the refining and managing of electric power marketing inspection
EP4024203A1 (en) System performance optimization
Baizyldayeva et al. Decision making procedure: applications of IBM SPSS cluster analysis and decision tree
Guodong et al. Joint optimization of complex product variant design responding to customer requirement changes
Tsai et al. A comparative study of hybrid machine learning techniques for customer lifetime value prediction
CN102117464A (en) Marketing investment optimizer with dynamic hierarchies
Baralis et al. Planning stock portfolios by means of weighted frequent itemsets
Chen Construction project cost management and control system based on big data
Dai Designing an Accounting Information Management System Using Big Data and Cloud Technology
Pivk et al. On approach for the implementation of data mining to business process optimisation in commercial companies
Sun Construction of integration path of management accounting and financial accounting based on big data analysis
CN107239853B (en) Intelligent housekeeper system based on cloud computing and working method thereof
Sun et al. Using improved RFM model to classify consumer in big data environment
Huang et al. A comparative study of data mining techniques for credit scoring in banking
Kar et al. A study on using business intelligence for improving marketing efforts
CN102890710A (en) Excel-based data mining method
Duchemin et al. Forecasting customer churn: Comparing the performance of statistical methods on more than just accuracy
Ntaliakouras et al. An apache spark methodology for forecasting tourism demand in greece

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130123