Tuesday, June 4, 2019

Data Mining techniques

info Mining techniquesABSTRACTCompetitive advantage use ups abilities. Abilities argon built done acquaintance. Knowledge comes from info. The process of extracting knowledge from entropy is c anyed info Mining. info tap, the extraction of hidden predictive breeding from large entropybases, is advance technique to succor companies to highlight the close to important information in their info w beho utilisations. data mining tools predicts future trends and behaviors. Data mining tools so-and-so answer business questions that traditionally were too time consuming to resolve. Data Mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be combined with vernal products and system as they are brought online.A Data storage warehouse is a platform that contains all of an organizations entropy in 1 baffle in a commutationized and normalized form for deployment to substance abusers, to fulfill simple reporting to complicated compend, decision support and executive level reporting/archiving needs. Physically, a information warehouse is a memorial of information that businesses need to thrive in the information age. Analytically, a entropy warehouse is a modern reporting environs that provides users direct access to their data. In the information age, data warehousing is a powerful strategic weapon. Not only does it let organizations compete across time, it is to a fault a rising tide strategy that can elevate the strategic acumen of all employees in a fields.This paper presents an overview of the data mining and warehousing, their base definitions, how they are implemented and their pros and cons. data computer memoryIn todays competitive global business environment, it is crucial for organisations to understand and manage green light wide information for making timely decisions and respond to changing business conditions. With the receding e conomy, enterprises require changed their business focus towards customer orientation to remain competitive. Consequently, CRM tops their agenda and numerous companies are realizing the business advantage of leveraging one of their key assets data. Many research reports indicate that the nub of data in a given organization multiply every five years. As said earlier, the most fundamental aspect affecting the successful functioning of a business enterprise is the crucial decisions taken in this regard by the management. The cardinal entity that helps them in taking these decisions is the business critical information. This information can only be reliable and accurate if all the business link data is properly analyzed and further a thorough analysis is only possible if all the data affecting the enterprise is present at one place. The solution a data warehouseData Warehouse is a single, fatten out consistent store of data obtained from a variety of different sources made un committed to end users in what they can understand use in a business context. Today, data warehousing is one of the most talked-about business technologies in the corporate world.DATA MININGData mining is a powerful newfangled technology with great potential to help companies focus on the most important information in the data they have collected about the behavior of their customers and potential customers. It discovers information within the data that queries and reports cant effectively reveal. The amount of raw data stored in corporate databases is exploding. From trillions of point-of-sale transactions and credit card purchases to pixel-by-pixel images of galaxies, databases are now measured in gigabytes and terabytes. Raw data by itself, however, does not provide much information. In todays fiercely competitive business environment, companies need to rapidly turn these terabytes of raw data into significant insights into their customers and markets to guide their marketing, investment.Fig Data ExplosionData mining, or knowledge discovery, is the computer-assisted process of digging through and analyzing enormous sets of data and then extracting the gist of the data. Data mining tools predict behaviors and future trends, allowing businesses to make proactive, knowledge-driven decisions. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They flush databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.Data mining derives its name from the similarities in the midst of searching for valuable information in a large database and mining a mountain for a vein of valuable ore. Both processes require either sifting through an immense amount of material, or intelligently probing it to find where the value resides.Frequently, the data to be mined is first extracted from an enterprise data warehouse into a data mining database or data mart .The data mining database may be a logical rather than a physical subset of your data warehouse.DATA WAREHOUSING1. DEFINITIONA data warehousing (DW) is a subject-oriented, integrated, time variant, non-volatile collection of data in support of managements decision making. A data warehouse is a relational database management system (RDMS) which offer organizations the ability to gather and store enterprise information in a single conceptual enterprise repository and is designed specifically to meet the needs of transaction processing systems. Data Warehousing deals with the organizing collecting data into database that can be searched mined for information through the use of intelligence solution. 2. CHARACTERISTICS OF A DATA WAREHOUSE1) Subject-oriented The data in the database is organized so that all the data elements relating to the same real-world event or objective lens are linked together 2) Time-variant The changes to the data in the database are tracked and recorded so t hat reports can be produced showing changes over time 3) Non-volatile Data in the database is never over-written or deleted once committed, the data is static, read-only, but retained for future reporting and 4) Integrated The database contains data from most or all of an organizations operable applications, and that this data is made consistent. 3. ARCHITECTURE OF DATA WAREHOUSEThe architecture for a data warehouse is given below. Building this architecture requires four basic travel1) Data are extracted from the various and internal source system files and databases. In a large organization there may be dozens or even hundreds of such files and databases.2) The data from the various source systems are transformed and integrated before being loaded into the data warehouse. Transactions may be sent to the sources system to correct errors discover in data staging.3) The data warehouse is a database organized for decision support. It contains both detailed and summary data.4) subst ance abuser access the data warehouse by means of a variety of query languages and analytical tools. Results (e.g. prediction, forecast ) may be fed back to data ware house and operational databases. Information integrated in advanceStored in warehouse for direct querying and analysis Fig Architecture of typical data warehouse ,and the querying and data-analysis support Architecture in Conceptual ViewSingle-layer Every data element is stored once only Virtual warehouse Two-layer Real-time + derived data Most commonly employ approach in industry today Three-layer transformation of real-time data to derived data really requires 2 steps 4. ISSUES IN BUILDING A WAREHOUSE1) When and how gather data In a source driven architecture for gathering data, there data sources transmit new information. In a destination -driven architecture, the data warehouse periodically sends request for new data to the data source . 2) What Schema To Use Data sources that have been constructed independently are likely to have different schemas, part of data warehouse is schema integration, and to convert data to the integrated schema before they are stored .as a result data stored in warehouse are not just a copy of the data at the source 3) Data neaten The task of correcting and preprocessing data is called data cleansing data sources often deliver data with numerous minor inconsistencies that can be corrected.4) How To Propagate Updates Updates on dealings at the data sources must be propagated to data warehouse, if the relations at the data warehouse are exactly the same as those data source, annex is straightforward 5) What To Summarize The data generated by the transaction-processing system may be too large to store online .we can maintain summary of data obtained by aggregation on a relation.5. DATA WAREHOUSE MODELData warehousing is the process of extracting and transforming operational data into informational data and loading it into a central data store or warehouse. Onc e the data is loaded it is accessible via desktop query and analysis tools by the decision makers. The data warehouse flummox is illustrated in the following figure. The materialized views contain summary data compiled from several data sources. The auxiliary views in the picture are not mandatory, and are use to contain additional information needed to support the synchronization of the materialized views with the data sources. Fig Data ware house modelThe data within the true warehouse itself has a distinct structure with the stress on different levels of summarization as shown in the figure below. Fig Structure of data warehouse6. STAGES IN IMPLEMENTATION A DW implementation requires the integration of implementation of many products. Following are the steps of implementation-Step1 Collect and analyze the business requirements.Step2 hold a data model and physical design for the DW.Step3 Define the Data sources.Step4 Choose the DBMS and software platform for DW.Step5 Extract t he data from the operational data sources, transfer it, clean it load into the DW model or data mart.Step6 Choose the database access and reporting tools.Step7 Choose the database connectivity software.Step8 Choose the data analysis and presentation software.Step9 play along refreshing the data warehouse periodically. 7. DATA MARTSA data warehouse is the sum of all its data marts. A data mart is a complete pie-wedge of the overall data warehouse pie, a restriction of the data warehouse to a single business process or to a group of related business processes targeted toward a particular business group. Data marts can be customized for the end users ,and can present data in different formats for the end-users benefit. Data marts can employ OLAP , which is a method of database indexing that enhances quick access to data, specially in queries of data or viewing the data from many different aspects.DATA MINING1. DEFINITIONData Mining, or Knowledge Discovery in Databases (KDD) as it is also known, is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data.Data mining refers to using a variety of techniques to identify nuggets of information or decision-making knowledge in bodies of data, and extracting these in such a way that they can be put to use in the areas such as decision support, prediction, forecasting and estimation. The data is often voluminous, but as it stands of low value as no direct use can be made of it it is the hidden information in the data that is useful.A data mining is also defined as A new discipline lying at the interface of statistics, data base technology, pattern recognition, and machine learning, and concerned with secondary analysis of large data bases in order to find previously unsuspected relationships, which are of interest of value to their owners. 2. PROCESSThe data mining process can be dual-lane into four steps Data Selection Data Processing Data Transformation Data Mining Inte rpretation Evaluation Fig Process used in data mining3. WORKINGWhile large-scale information technology has been evolving separate transaction and analytical systems, data mining provides the link between the two. Data mining software analyzes relationships and patterns in stored transaction data based on open-ended user queries. Several types of analytical software are available statistical, machine learning, and neural networks. Generally, any of four types of relationships are sought Classes Stored data is used to locate data in predetermined groups. For example, a restaurant mountain range could mine customer purchase data to determine when customers visit and what they typically order. This information could be used to increase traffic by having daily specials. Clusters Data items are grouped according to logical relationships or consumer preferences. For example, data can be mined to identify market segments or consumer affinities. Associations Data can be mined to identif y associations. The beer-diaper example is an example of associative mining. Sequential patterns Data is mined to anticipate behavior patterns and trends. For example, an outdoor equipment retailer could predict the likelihood of a backpack being purchased based on a consumers purchase of sleeping bags and hiking shoes. 4. MODELS RELATED TO DATA MININGThere are two types of model or modes of operation, which may be used to discover information of interest to the user. 1) Verification Model The verification model takes input from the user and tests the validity of it against the data. The emphasis is with the user who is responsible for formulating the hypothesis and issuing the query on the data to affirm or negate the hypothesis. 2) Discovery ModelThe discovery model differs in its emphasis in that it is the system automatically discovering important information hidden in the data. The data is sifted in search of frequently occurring patterns, trends and generalizations about the data without intervention or guidance from the user. 5. TECHNIQUES USED IN DATA MINING Artificial neural networks Non-linear predictive models that learn through training and resemble biological neural networks in structure. Decision trees Tree-shaped structures that represent sets of decisions. These decisions generate rules for the classification of a dataset. Specific decision tree methods include Classification and Regression Trees (CART) and Chi Square machine-driven Interaction Detection (CHAID). Genetic algorithms Optimization techniques that use processes such as genetic combination, mutation, and natural selection in a design based on the concepts of evolution. Nearest neighbor method A technique that classifies each record in a dataset based on a combination of the classes of the k record(s) most similar to it in a historical dataset (where k 1). Sometimes called the k-nearest neighbor technique. Rule induction The extraction of useful if-then rules from data based on statistical significance. 6. TWO STYLES OF DATA MININGThere are two styles of data mining. Directed data mining is a top-down approach, used when we know what we are looking for. This often takes the form of predictive modeling, where we know exactly what we want to predict. Undirected data mining is a bottom-up approach that lets the data declaim for itself. Undirected data mining finds patterns in the data and leaves it up to the user to determine whether or not these patterns are important. 7. POTENTIAL APPLICATIONSData mining has many and varied fields of application some of which are listed below. Marketing Identify buying patterns from customers Market basket analysis. Banking Detect patterns of fraudulent credit card use Identify loyal customers. Insurance and Health Care Claims analysis, Predict which customers will buy new policies Identify fraudulent behavior. Transportation Determine the distribution schedules canvass loading patterns.CONCLUSIONOrganizations today are under tremendous pressure to compete in an environment of tight deadlines and reduced profits. Legacy business processes that require data to be extracted and manipulated prior to use will no longer be acceptable. Instead, enterprises need rapid decision support based on the analysis and forecasting of predictive behavior. Data-warehousing and data-mining techniques provide this capability.A data warehouse is a modern reporting environment that provides users direct access to their data. A Data warehousing is the sum of all its Data Marts. Data warehousing strategy allows organizations to move from a defensive to an offensive decision-making position. The purpose of data warehouse is to consolidate and integrate data from a variety of sources and to format those data in a context for making accurate business decisions.Data mining offers firms in many industries the ability to discover hidden patterns in their data patterns that can help them understand customer behavior and ma rket trends. The advent of parallel processing and new software technology enable customers to capitalize on the benefits of data mining more effectively than had been possible previously. REFERENCES1) www.geekinterview.com/Interview-Questions/Data-Warehouse 2) www.datawarehousing.com/ 3) http//en.wikipedia.org/wiki/Data_warehouse 4) www.megaputer.com5) www.research.microsoft.com

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.