Data Mining In A Nut Shell Essay, Research Paper
In today?s business world, information about the customer is a
necessity for a businesses trying to maximize its profits. A new,
and important, tool in gaining this knowledge is Data Mining. Data
Mining is a set of automated procedures used to find previously
unknown patterns and relationships in data. These patterns and
relationships, once extracted, can be used to make valid
predictions about the behavior of the customer.
Data Mining is generally used for four main tasks: (1) to improve
the process of making new customers and retaining customers; (2) to
reduce fraud; (3) to identify internal wastefulness and deal with
that wastefulness in operations, and (4) to chart unexplored areas
of the internet (Cavoukian). The fulfillment of these tasks can be
enhanced if appropriate data has been collected and if that data is
stored in a data warehouse. According to Stanford University, «A
Data Warehouse is a repository of integrated information, available
for queries and analysis. Data and information are extracted from
heterogeneous sources as they are generated….This makes it much
easier and more efficient to run queries over data that originally
came from different sources.» When data about an organization?s
practices is easier to access, it becomes more economical to mine.
?Without the pool of validated and scrubbed data that a data
warehouse provides, the data mining process requires considerable
additional effort to pre-process the data? (SAS Institute).
There are several different types of models and algorithms used to
?mine? the data. These include, but are not limited to, neural
networks, decision trees, rule induction, boosting, and genetic
algorithms.
Neural networks are physical cellular systems which can acquire,
store, and
utilize experiential knowledge (Zurada). Neural networks offer a
way to efficiently model large and complex problems. Decision trees
are diagrams used for making decisions in business or computer
programming. Branches are used to represent choices with associated
risks, costs, results, or probabilities. Rule induction is a way of
deriving a set of rules to classify cases (Two Crows). These set of
rules differ from those in a decision tree in that they are
independent from one another. Boosting is a technique in which
multiple random samples of data are taken and a classification
model for each set of data is made (Two Crows). The genetic
algorithm is a model of machine learning, whose behavior is based
on the processes of evolution in nature. Populations of data are
resented by chromosomes and then go through a process of evolution.
The members of one set of data compete to pass on their most
favorable characteristics to the next generation of data. This
process continues until the best data is found. Many of the models
and algorithms used in data mining are simplifications of the
linear regression model.
Data Mining is largely, if not entirely used for business purposes.
The highest users of data mining include banking, financial, and
telecommunications industries (Two Crows).
A survey taken by Two Crows Corporation turned up these
applications of data mining:
·Ad revenue forecasting
·Churn (turnover) management
·Claims processing
·Credit risk analysis
·Cross-marketing
·Customer profiling
·Customer retention
·Electronic commerce
·Exception reports
·Food-service menu analysis
·Fraud detection
·Government policy setting
·Hiring profiles
·Market basket analysis
·Medical management
·Member enrollment
·New product development
·Pharmaceutical research
·Process control
·Quality control
·Shelf management/store management
·Student recruiting and retention
·Targeted marketing
·Warranty analysis
Data mining will have a different effect on different industries in
the business world. In the telecommunications industry, for
example, in order to retain or build market share and expand or
develop new products and services, service providers will have to
make the necessary adaptations and changes that the industry and
pace setting technology requires.
?The most successful telecommunications companies will, of course,
be the ones who can develop and market products and services that
customers will buy,? says Julian Kulkarni, SAS institute Europe?s
Product Marketing Coordinator for telecommunications. ?But high
customer churn rates in telcom markets show that you cannot depend
on customer loyalty. To thrive, companies must know their
customers, their products, their own operations, and the
competition better.?
The key to succeeding in this rapidly changing industry is to
understand the customer, or the market that the customer
represents. Through data mining, telecommunications companies can
know what their customers have done in the past and what they will
do in the future. With this information, the companies will be in
ideal positions to make business decisions based on the information
they have gained from the data mining process.
Other real world examples of data mining include:
·Targeting a set of consumers who are most likely to respond to a
direct mail campaign
·Predicting the probability of default for consumer loan
applications
·Predicting audience share for television programs
·Predicting the probability that a cancer patient will respond to
radiation therapy
·Predicting the probability that an offshore oil well is actually
going to produce oil
There are many computer applications on the market to assist
businesses in the data mining process. The applicability of these
programs can accommodate the various uses of data mining. Software
titles include AC2, ALICE d’Isoft, AutoClass C, C5.
0 (See5),
Clementine, Data Surveyor, DataDetective, DataEngine, Datasage,
DataScope, DataX(tm), DbBridge, dbProbe, dbProphet, Explora, IBM
Visualization Data Explorer, INLEN, IRIS, IXL & IDIS software,
LEVEL5 Quest, MineSet (SGI), ModelQuest MarketMiner, Nuggets(TM),
Partek, PolyAnalyst, PV-WAVE, SE-Learn, Sipina-W v2.0 & Sipina-Pro,
Snob, SPSS Data Mining Software, The Data Mining Suite, Thinkbase’s
Data Mining Product, TiMBL (Tilburg Memory Based Learner),
Tooldiag, WINROSA, WinViz, WizWhy, XmdvTool, and XpertRule.
Summary Table (Pryke):
CompanyProductMajor FunctionURL
IsoftALICEd?IsoftAlice is a powerful and easy to use Data Mining
Tool. Use decision trees to explore & exploit your data. Textual
reports, SQL queries generation, What-If Analysis,
etc.http://www.isoft.fr/
SPSSClementineClementine is the leading data mining toolkit, twice
winning the UK Government’s (Department of Trade & Industry) SMART
award for innovation. Clementine applications include customer
segmentation/profiling for marketing companies, fraud detection,
credit scoring, load forecasting for utility companies, and profit
prediction for retailers.http://www.isl.co.uk/clem.html
Data DistilleriesData SurveyorData Surveyor is a data mining tool
for expert users. It consists of a suite of powerful algorithms and
provides support for all steps in the knowledge discovery process.
Data Surveyor allows the user to interactively discover knowledge,
inspect results during discovery and guide the discovery process.
Data Surveyor applications include database marketing, credit
scoring and risk analysis.http://www.ddi.nl/
MITDataEngineDataEngine is a software product for data analysis
using fuzzy technologies, neural networks, and conventional
statistics. It has been successfully applied in the fields of
forecasting, data base marketing, quality control, process
analysis, and diagnosis.The special features of the new version are
on the one hand the high flexibility concerning the integration
into existing solutions, which is supported by a flexible ASCII
import and the import of MS-Excel files. On the other hand it is
possible to include any kind of user defined functions into
DataEngine.In addition to this, DataEngine 2.0 becomes the tool for
professional data analysis thanks to the 32 bit architecture and
the productive graphic component for data
visualization.http://www.mitgmbh.de/
DataSage, Inc.DatasageDatasage provides a suite of C++ modules
which maintain data inside an existing relational database where it
can be managed more effectively, (the company calls this «data
centricism»). Datasage then uses high-speed C++ routines to read
and batch process the data. As a result, the product can handle
very large databases. Datasage includes a suite of data transforms,
modeling and analysis tools, including neural networks and factor
analysis.http://www.datasage.com/
Trajecta, Inc.dbProphetUtilizing sophisticated neural network
technologies, Trajecta offers a broad range of software and
services that provide highly accurate predictions of complex
customer behavior and market trends. Trajecta’s non-technical,
easy-to-use software can also help optimize business activities,
allowing its users to exceed their business
goals.http://www.trajecta.com/
Summary Table (Pryke):
CompanyProductMajor FunctionURL
SGIMineSet (SGI)Combining powerful integrated, interactive tools
for data access and transformation, data mining, and visual data
mining, MineSet provides you with a revolutionary paradigm for
getting maximum value from your vast data resources. MineSet
enables you to gain a deeper, intuitive understanding of your data,
by helping you to discover hidden patterns, important trends and
new knowledge. It is this deep understanding which can be used for
developing powerful business strategies leading to greater
competitive advantage.http://www.sgi.com/software/mineset/
Data Mining Technologies Inc.Nuggets?Nuggets uses proprietary
search algorithms called SiftAgents(TM) to develop English «if –
then» rules. These algorithms use genetic methods and learning
techniques to «intelligently» search for valid hypotheses that
become rules. In the act of searching, the algorithms «learn» about
the training data as they proceed. The result is a very fast and
efficient search strategy that does not preclude any potential rule
from being found. The new and proprietary aspects include the way
in which hypotheses are created and the searching methods. The user
sets the criteria for valid rules. Nuggets also provides a suite of
tools to use the rules for prediction of new data, under-standing,
classifying and segmenting data. The user can also query the rules
or the data to perform special
studies.http://www.data-mine.com/
Partek Inc.PartekSoftware for data mining and knowledge discovery
based on statistical methods, data visualization, neural networks,
fuzzy logic and genetic algorithms.http://www.partek.com/
MITWINROSAWINROSA is a software tool which generates automatically
Fuzzy If-Then Rules from your data. The generated data set can be
run by most of the existing fuzzy tools like e.g. DataEngine,
fuzzyTECH, and Matlab.http://www.mitgmbh.de/
Attar SoftwareXpertRuleData Mining using high performance parallel
SQL technologyA Windows PC client being able to intelligently query
the data source on the host server can achieve knowledge Induction.
The speed of the process is therefore dependant upon the server –
not the speed of the client PC. This allows data mining to exploit
the speed offered by MPP servers (Massive Parallel Processors) and
database architectures that are optimized for serving
queries.http://www.attar.com/
5c2
Cavoukian, Ann, Ph.D. ?Data Mining: Staking a Claim on Your
Privacy.? Jan. 1998
Pryke, Andy. ?The Data Mine.? 23 Sep. 1998
SAS Institute Inc. ?Data Mining.? 12 Jan. 2000
Two Crows Co. ?Introduction to Data Mining and Knowledge
Discovery.? 1999
Zurada, J.M. (1992), Introduction To Artificial Neural Systems,
Boston: PWS Publishing Company, p. xv:
Data Mining In A Nut Shell Essay
20
0
6 минут
Темы:
Понравилась работу? Лайкни ее и оставь свой комментарий!
Для автора это очень важно, это стимулирует его на новое творчество!