Data mining & Knowledge Discovery Databases In business decisions

Dr. Vijay Pithadia

Today computerization of many business and government transactions related to activities and decisions generates the floods of data by large and simple transaction i.e. tax returns, telephone calls, business trips, performance tests and product warranty registration are being handled through computer. For the processing the data now are days many traditional and statistical methods of data analyses i.e. ad-hoc queries and spreadsheets are used for to obtained informative reports from data but they can’t give the knowledge from data. In the present paper how the data mining and KDD technology can facilitates analyses of the data in order to get the important knowledge hidden inside the data. The second aim of this study is to awareness among the Indian Universities Teachers, Industries- Organizations people and also among software professionals to generate projects and to promote the technology in business decisions.


Since last couple of years a term Data Mining is being heard from computer professionals. Data Mining [DM] is a new class of intelligent analytical method having ability to intelligently and automatically assist humans in analyzing the mountains of data for nuggets of useful knowledge. Data mining is an iterative process of extracting interesting knowledge from data in large databases. Where knowledge could be rules, patterns, regularities, relationships, constraints etc. Secondly knowledge should be valid and potentially useful and third the hidden information in the data that is useful. Where as KDD is the over all process of finding and interpreting knowledge from data. Knowledge discovery is defined as ``the non-trivial extraction of implicit, unknown, and potentially useful information from data''. In, a clear distinction between data mining and knowledge discovery is drawn. Under their conventions, the knowledge discovery process takes the raw results from data mining and carefully and accurately transforms them into useful and understandable information. This information is not typically retrievable by standard techniques but is uncovered through the use of AI techniques.

KDD is a growing field: There are many knowledge discovery methodologies in use and under development. Some of these techniques are generic, while others are domain-specific. The purpose of this paper is to present the results of a literature survey outlining the state-of-the-art in KDD techniques and tools. The paper is not intended to provide an in-depth introduction to each approach; rather, we intend it to acquaint the reader with some KDD approaches and potential uses.

The subject goal is extracting knowledge from data in context of large databases and to make patterns/ Knowledge in understandable forms to human beings in order to justify a better understanding of the underlying data. The emerging technology KDD having a multi step process which uses Data Mining Methods to extract what is hidden knowledge in the data according to specifications of measures. Thus data mining underlying prediction on similar groups of data and Description involves findings human interpretable patterns describing the data in business and industry from Financial Management, Marketing Management, and Economic Surveys of companies to Insurance, Banking and maintenance areas of Business.

Basic Steps of KDD Process

Few of the basic steps of KDD process are discussed here;

[1] Problem Analysis: It is based on manual procedure. The main function is to understanding application domain and requirements of user related to developing prior knowledge for domain.
[2] Selection of Target data: Creating target data set and Selecting a data set or its subset on which discovery is to be performed by automatic way.
[3] Data Processing: The third step of KDD process involves removing noise/ handling missing data based on automatic program.
[4] Transformation of Data: This procedure is made manually where data reduction and projection are made and finding useful fields/features/attributes of data according to goal of the problem.
[5] Data Mining: Selection of data mining goal, choosing method according to task and extracting knowledge and analyzing/verifying knowledge.
It is based on automatic manner.
[6] Output Analysis and Review: Interpretation and evaluation the knowledge/ pattern transforms knowledge; rules reports, automatic usage and follow up for new predictions.

Techniques for Data Mining

For the purpose of Data Mining htere are many techniques used. Some most popular and commonly techniques i.e. Neural Networks, Nearest Neighbour Method And Decision Tree are Discussed.

[1] Neural Networks : It is based on non- linear predictive model and better for Financial Related areas. Neural Networks are analytic techniques modeled after the processes of learning in the cognitive system and the neurological functions of the brain and capable of predicting new observations from other observations after executing a process of so-called learning from existing data. Neural Networks is one of the Data Mining techniques.

The first step is to design a specific network architecture. The size and structure of the network needs to match the nature of the investigated phenomenon. Because the latter is obviously not known very well at this early stage, this task is not easy and often involves multiple "trials and errors." The new network is then subjected to the process of "training." In that phase, neurons apply an iterative process to the number of inputs to adjust the weights of the network in order to optimally predict the sample data on which the "training" is performed. After the phase of learning from an existing data set, the new network is ready and it can then be used to generate predictions.

[2] Nearest Neighbor Method: This techniques classifies each record in a data set based on a combination of the classes of the K- record/s related to it in a historical data set and therefore it is some times called as K- nearest neighbor techniques.

[3] Decision Tree: A Decision Tree consist of nodes and branches; beginning node called root. Depending upon the results of a test the data is classified into various subsets. The end result is a set of rules with all possibilities.This method is useful in certain algorithms represent decisions. These decision generates rules for classification of a data set. Specific Decision Tree method include Classification and Regression Trees [CART] and Chi - Square Automatic Interaction Detection [CHAID]

Data Mining Solutions for Business

The application areas of DM techniques are useful in business decisions. Some of the potential areas are i.e. Banking, Finance, Survey’s related to Customer satisfaction, Market, Buying behavior, Customer characteristics, Economic, Direct Marketing.The details are described below:
[a] Financial Market : In the financial market,using various imperical models of market behaviour,technical analysis for forecasting price dynamics and selecting the optimal structure of investment portfolio can be justified.Such systems have special interfaces for laoding financial data i. e. Supercharts wall street money etc Data mining methods are also facilitates the analysis and slection of stocks and other financial instruments.

[b] Banking : In the banking functions such as mortgage approval,loan underwriting,money lending/borrowing,loyal customer prediction,stock trading rules identification etc are the important areas for Data Mining.This system also predict the characteristics of ATM card users who sale the cards at point of sale.A system can evolve prediction models for several levels of card usage,based on parameters such as customer age,average checking account balance,return per month,number of cheques etc.In the case of mortgage loans data mining system facilitate an excellent set of discrimination rules by only 8% error rate.The input parameters are account information i.e. loan source, rates and loan to the value as well as borrower demographic information.

[c] Database Marketing : In the business world database marketing is the most successful application.The main functions of data base marketing are analyses customer data base,find patterns of existing customer preferences,to target slection of future customers.Many companies are using database marketing techniques,i.e. American Express reported that due to database marketing their purchases of credit card is increased by 15-20%.The possible apllications are Market research including media selection product segmentation,broadcasting analysis and product success prediction. A system allows television programming executives to arrange show schedules for predicting audience share to maximize market share and increase advertising revenues.

[d] Supply Chain Management (SCM) : The fundamental operation of retail is the supply chain management,product or services from the manufacturer to the customer via retail eiter virtual or physical.Data mining can help viz maximising sales and profits through an optimisation of marketing actions and providing necessary insights for the retailer to properly manage customers,promoters, products,stores and employees.

[e] Marketing Strategies : Target marketing actions such as direct mail campaigns are more expensive to produce and inportant is to find mailing to those individuals most likely to buy.Generating business models under the various condition is very difficult and complex.The function of target marketing can be achieved by data mining applications.

[f] Sales Forecasting : The important use of sales forecasting is for the optimisation of stocks and purchases.Retails can predict with accuracy sales as per item and location in order to optimise level of stocks,on the basis of past data.

[g] Fraud Detection and Prevention : Data mining also palys an important role in this area.Fraud can be detected in insurance of a person, tax returns, accounts,credit cards, etc. A system can analyse the probability that the new account is fraudulent.The probabilities are used to sort the accounts so that these with highest probability can be further investigated by fraud analysis.

Indian Players in Data Mining

In India a very few Organization like IIT-B, Mumbai, IIT-K, Kanpur, Tata Infotech, Mumbai, IBM-India, Banglore and ISI-C, Kolkata are working toeards this area because cost effective solutions is the major theme for development of promising technology data mining. IIT-K, kanpur and IBM-India,Bangalore are working for tools development where as Tata Infotech also working on the tools and application development includes TULearn,a set of industrail quality tools to define the nature of database and then to learn how to classify data into data bases.It consist of Credit card Eligibility Analysis,Customer satifactory survey,Market survey of Hindustan Lever Ltd., BPL Mobile fraud detection etc. ISI-C,Kolkata has been engaged on the problems:

(a)Classification of Archaeological Materials and
(b)Market survey of quality control towards the customer Satisfaction indices.

Research Issues

The techniques of data mining is starts as new emerging concepts and all aspects of this technology are at the research level shows the developments as well improvement of its efficiency and scalability. The research challenges are arranged into five broad areas:

A) improving the scalability of data mining algorithms,
B) mining non-vector data,
C) mining distributed data,
D) improving the ease of use of data mining systems and environments, and
E) privacy and security issues for data mining.
A. Scaling data mining algorithms. Most data mining algorithms today assume that the data fits into memory. Although success on large data sets is often claimed, usually this is the result of sampling large data sets until they fit into memory. A fundamental challenge is to scale data mining algorithms as

the number of records or observations increases;
the number of attributes per observation increases;
the number of predictive models or rule sets used to analyze a collection of observations increases;
and, as the demand for interactivity and real-time response increases.
Not only must distributed, parallel, and out-of-memory versions of current data mining algorithms be developed, but genuinely new algorithms are required. For example, association algorithms today can analyze out-of-memory data with one or two passes, while requiring only some auxiliary data be kept in memory.

B. Extending data mining algorithms to new data types. Today, most data mining algorithms work with with vector-valued data. It is an important challenge to extend data mining algorithms to work with other data types, including

1) time series and process data,
2) unstructured data, such as text,
3) semi-structured data, such as HTML and XML documents,
4) multi-media and collaborative data,
5) hierarchical and multi-scale data, and
6) and collection-valued data.

C. Developing distributed data mining algorithms. Today most data mining algorithms require bringing all together data to be mined in a single, centralized data warehouse. A fundamental challenge is to develop distributed versions of data mining algorithms so that data mining can be done while leaving some of the data in place. In addition, appropriate protocols, languages, and network services are required for mining distributed data to handle the meta-data and mappings required for mining distributed data. As wireless and pervasive computing environments become more common, algorithms and systems for mining the data produced by these types of systems must also be developed.

D. Ease of Use. Data mining today is at best a semi-automated process and perhaps destined to always remain so. On the other hand, a fundamental challenge is to develop data mining systems which are easier to use, even by casual users. Relevant techniques include improving user interface, supporting casual browsing and visualization of massive and distributed data sets, developing techniques and systems to manage the meta-data required for data mining, and developing appropriate languages and protocols for providing casual access to data.

E. Privacy and Security. Data mining can be a powerful means of extracting useful information from data. As more and more digital data becomes available, the potential for misuse of data mining grows. A fundamental challenge is to develop privacy and security models and protocols appropriate for data mining and to ensure that next generation data mining systems are designed from the ground up to employ these models and protocols.


The application of Data Mining is emerging and powerful technology for improving business strategies,helping in design of new products & quality of products. It complements and can often replace the other business tools i.e. computer reporting and querying,statisfied analysis.Data Mining have modulation of multiple disciplines such as Database systems,Data Warehousing and OLAP (Online Analytical Processing), Machine learning,Information science,statistics,visualisation and other disciplines such as Mathematical Modelling, Pattern Recognisation, Neural Networks,Image/Signal Analysis, Web Technology etc. In the busniess decision above all models can facilitates more suitability to the decision.


[1] Betttini et.al.(1998),”Discovering frequent event patterns with multiple granuality in time sequences”.IEEE transaction on knowledge and data engineering,Vol.10,No.2,March/April.
[2] Cabena et.al.(1998),”Discovering Data Mining from concept to Implementation “,Prentice Hall,USA.
[3] Chaudhary and Dayal (1996),” Decision support,Data Warehousing and OLAP”,VLDB.
[4] Fayyad et.al.(1997),”Data Mining and Knowledge Discovery”– J journal.
[5] Jiawei Han(1996),” Data Mining techniques,a SIGMOD’96 Conference Tutorial.
[6] Michael Gilmant(1998),” Nuggets and Data Mining”A white paper,February.
[7] Piatetsky Shapiro (1998),”Data Mining 101”a white paper, June.
[8] Rakesh Agrawal 1996),”Data Mining Technologies”,Proc.International Conference VLDB
[9] V.Estivill Castro and A.T. Murray(1998), “Mining Spatial Data Via Clustering “Proc. International symposium on spatial data handling-SDH’98 canada,July 11-15
[10] Fayyad, U.M., Piatetsky-Shapiro, G., and Smyth, P. From Data Mining To Knowledge Discovery: An Overview. In Advances In Knowledge Discovery And Data Mining , eds. U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, AAAI Press/The MIT Press, Menlo Park, CA., 1996, pp. 1-34.
[11] Frawley, W.J., Piatetsky-Shapiro, G., and Matheus, C. Knowledge Discovery In Databases: An Overview. In Knowledge Discovery In Databases, eds. G. Piatetsky-Shapiro, and W. J. Frawley, AAAI Press/MIT Press, Cambridge, MA., 1991, pp. 1-30.