Journal of Knowledge Management Practice, Vol. 13, No. 2, June 2012

KDD for Business Intelligence

Rafi Ahmad Khan, University of Kashmir, Srinagar, India

ABSTRACT:

Emergence of knowledge based economy has posed serious challenges to companies. Intelligent organizations recognize that knowledge is an intellectual asset that grow over time and when harnessed effectively, can sustain competition and innovation. Organizations can use IT for leveraging the entire organization’s intellectual resources for great financial impact. Business Intelligence (BI) along with KDD (Knowledge Discovery in Databases) plays a pivotal role in leveraging the intellectual assets of companies by creating, storing and sharing that knowledge for effective decision making. Companies are now realizing the potential payoffs of KDD applications along with BI. Consequently, BI is spreading its wings to cover small, medium and large companies. This paper explores the concepts of BI, KDD, process of knowledge discovery, key levers of knowledge strategy and benefits of BI. 

Keywords: Business intelligence, Knowledge discovery in databases, KDD, Data mining, OLAP


Introduction

Over the past few decades, the industrialized economy has been going through a transformation from being based on natural resources to being based on intellectual assets (Alavi, 2000; Tseng & James Goo, 2005). The knowledge based economy is a reality (Godin, 2006). Rapid changes in the business environment cannot be handled in traditional ways as companies are expanding and are much larger today than they used to be, fueling the need for better tools for collaboration, communication and knowledge sharing.  Competing in the globalized economy and markets requires quick and effective response to customer needs and problems. For companies spread over wide geographical areas and virtual organizations, managing knowledge is critical for providing services. Companies must develop the strategies to sustain competitive advantage by leveraging their intellectual assets for optimal performance (Skyrme, may, 2002; Agrawal et al, 1996).

Knowledge And Business Intelligence (BI)

Presently, information technology assumes a major role in business, because of it pivotal role in building business intelligence in enterprises.  Business Intelligence (BI) is a broad category of applications and technologies for gathering, analyzing, and providing access to huge data stored within the company’s database. The term intelligence in Business Intelligence (BI) is closely related with knowledge. Knowledge refers to stored information or models used by a person or machine to interpret, predict and appropriately respond to the outside world (Fischler & Firschein, 1987). In the IT context knowledge is very distinct from data and information. Whereas data are facts, measurements and statistics, information is organized or processed data that is timely and accurate (Hoffer et al, 2002; Kankanhalli & Tan, 2005). Knowledge is information that is contextual, relevant and actionable. Having knowledge implies that it can be exercised to solve a problem. While data, information and knowledge may all be viewed as assets of an organization, knowledge provides higher level of meaning about data and information.

Intelligence is often defined as the general mental ability to learn (acquire knowledge) and how to apply knowledge. Intelligence encompasses cognition. Cognition is the method by which people assimilate and integrate knowledge, while intelligence is both the assimilation of knowledge as well as the ability to apply such knowledge. Thus knowledge is imperative for BI system in an organization. A BI system must have capability to manage knowledge, store it in a knowledge repository and tools that can apply that knowledge for better decision making.  Companies are adopting Business Intelligence (BI) systems and tools because of their capability to learn from the past and forecast the future.  

Knowledge in an enterprise may originate from many different sources. They include information systems, reports, Internet, corporate databases, customers, suppliers or government agencies. Knowledge of the employees is an absolute source of information. It results from their experience and intuition.

Polanyi (1958) first conceptualized the difference between an organizations explicit and implicit knowledge. Explicit knowledge deals with more objective, rational, and technical knowledge (e.g., data, policies, procedures, software and documents). It is leaky knowledge as it can be readily documented (Alavi, 2000). Tacit knowledge is usually in the domain of subjective, cognitive and experiential learning; it is highly personal and difficult to formalize. It is also referred to as embedded knowledge (Tuggle & Goldfinger, 2004) as it typically involves expertise, know-how, trade secrets, skill set, understanding and learning, hence difficult to document. When people leave the organization, they take their knowledge with them. Consequently it has become vital for organizations to retain the valuable know-how that can so easily and quickly leave an organization. Organizations now recognize the need to capture and integrate both types of knowledge. BI is the process that transforms data into information and then into knowledge (Golfarelli et al, 2004). It  has proven to be successful not only in analyzing data, but also in discovering knowledge by uncovering trends and patterns that are hidden deep within datasets. These hidden trends and patterns can be investigated to forecast future directions (Watson & Wixom, 2007). BI is spreading its wings to cover small, medium and large companies, more and more analytical tools are penetrating the market to do any kind of analysis and help to make informed decision making (Khan & Quadri, 2012).

In business management term BI is used to be described as applications and technologies which are used to gather, provide access to and analyze data and information about an enterprise, in order to help them make better informed business decisions (Reinschmidt & Francoise, 2002; Moss & Atre, 2003; Wu et al, 2007; Jonathan, 2000). Ranjan (2008) argues that BI is the conscious, methodical transformation of data from any and all data sources into new forms to provide information that is business-driven and results-oriented. As per (Pirttimäki, 2004), BI process concept is understood as a continuous and systematic method of action by which an organization gathers, analyses, and disseminates relevant business information to business activities. Cui et al (2007) argues that BI is the way and method of improving business performance by providing powerful assistance to executive decision maker which enables them to have actionable information at hand. BI tools are viewed as technology that enhances the efficiency of business operation by providing an increased value to the enterprise information and hence the way this information is utilized. According to Zeng et al (2006) BI is “The process of collection, treatment and diffusion of information that has an objective, the reduction of uncertainty in the making of all strategic decisions.”

BI can be utilized to view not only current action, but also suggest the most suitable direction an organization should take, consequently BI can be an invaluable tool for decision–makers and managers (Dhar & Stein, 1997). However, the success of BI tools depends on the quality of data it uses. Therefore quality data, its transformation to information and extraction of knowledge from it, are essential to a successful BI implementation. As a result, it is vital to explore the techniques that can be implemented to select and analyze organizational data. Knowledge Discovery in Databases (KDD) is one process, which can be explored to ensure the highest quality of data is available for BI applications. 

Knowledge Discovery In Databases (KDD)

Database and their tools provide the necessary infrastructure to store, access, and manipulate data. Data warehousing, a recently popularized term, refers to the current business trend of collecting and cleaning transactional data to make them available for online analysis and decision support. A popular approach for analysis of data warehouses is called online analytical processing (OLAP) (Agrawal et al, 1996). OLAP tools focus on providing multidimensional data analysis, which is superior to SQL (a standard data manipulation language) in computing summaries and breakdowns along many dimensions. While current OLAP tools are semi-automated and target interactive data analysis, they will also include more automated discovery components in the near future. 

KDD is the automated process that relies on methods from various fields like pattern recognition, applied statistics, machine learning, neural networks etc. to find patterns from data in the data mining step of the KDD process.  The phrase ‘Knowledge Discovery in Databases’ was coined at the first KDD workshop in 1989 by Piatetsky-Shapiro, who emphasized that knowledge is the end product of a data-driven discovery (Piatetsky-Shapiro & Frawley, 1991). KDD can be defined as the non-trivial extraction of implicit, previously unknown, and potentially useful information from databases. The declining cost of storage technology and advances in data communication technologies have enabled the companies to capture and store data with ease (Fayyad, 1996: Witten & Frank, 2005). This growth, by far exceeds human capacities to analyze the databases in order to find hidden rules or patterns within the data. Therefore, knowledge discovery becomes more and more important in databases (Lazcorreta et al, 2008)

Knowledge discovery is multidisciplinary area of research (Wu, 2004) and is apparent in almost any field; science; marketing; finance; health care; retail etc. The traditional method of turning data into knowledge relies upon manual analysis and interpretation (Witten & Frank, 2005), but as the volume of data captured increases, manual data analysis has become unrealistic in many domains. Therefore, the requirement to scale up human analysis capabilities in order to analyze the large volumes of data became imperative.

KDD is concerned with the development of methods and techniques that helps in the analysis and interpretation of huge volumes of data. At the core of the KDD process is the application of specific data mining methods for pattern discovery and extraction (Han & Kamber, 2006; Geist, 2002).

Process Of Knowledge Discovery

The KDD process model is an interactive, iterative, procedure that attempts to extract implicit, previously unknown and potentially useful knowledge from data through a scientific method. KDD process model can be divided into a number of steps. 

There are a number of variants of KDD process, such as those that have been published by Adriaans & Zantinge (1996), Brachman & Anand (1996), and Han & Kamber (2006) in addition to others, however, all variants of the KDD process remain close to as described by (Fayyad, 1996) and supported by (Roiger & Geatz, 2003). It is a well-accepted approach of the KDD process, consisting of several stages as depicted in Figure 1. The various stages are:

Data Selection: The goal of this stage is the extraction of relevant data from huge data stored in operational database, data warehouse and data marts that is relevant to the data mining analysis.

Data Preprocessing: This stage of KDD is concerned with data cleansing and preparation tasks that are essential to ensure correct results. Eliminating missing values in the data, ensuring that coded values have a uniform meaning and ensuring that no spurious data values exist are typical activities that occur during this stage.

Data Transformation: This stage is aimed at transforming the data into a two-dimensional table and eliminating unwanted fields so the results are valid.

Data Mining: The goal of the data mining stage is to analyze the data by a suitable set of algorithms in order to discover meaningful patterns and rules and produce predictive models. This is the main element of the KDD cycle.

Interpretation and Evaluation: Using discovered knowledge includes incorporating this knowledge into the performance system, taking actions based on the knowledge, or simply documenting it and reporting it to interested parties. 

 
 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


Figure 1:  Knowledge Discovery Process

While data mining algorithms have the potential to yield an unlimited number of patterns hidden in the data, several of these may not be meaningful or useful. This final phase is aimed at selecting those models that are valid and worthwhile for making future business decisions. It is possible to return to any of the previous stages at this point, should the need arise. Furthermore, the evaluation can also involve visualization of the extracted patterns and models or visualization of the data given the extracted models. The result of this process is newly acquired knowledge formerly hidden in the data. There are generally five types of information that can be obtained through data mining which include (Turban & Aronson, 2001):

·         Classification: It infers the defining characteristics of certain group (e.g., customers who have been lost to competitors).

·         Clustering: It identifies groups of items that share a specific characteristics (clustering differs from classification in that no predefining characteristics is given).

·         Association: It identifies relationships between events that occur at one time (e.g., the contents of a shopping basket).

·         Sequencing: It is similar to association, except that the relationship exists over a period of time (e.g., repeat to a supermarket or use of a financial planning product).

·         Forecasting: It estimates future values based on patterns within large sets of data(e.g.,  demand forecasting)

Data Mining constitutes one step in the KDD process. It is in data mining step that the actual search for patterns of interest is performed. It is important at this stage to choose the appropriate data mining algorithm (like neural networks, linear/logistic regression, association rules, etc.) for the data mining task. The data mining task itself can be a classification task, linear regression analysis, rule formation, or cluster analysis (Imberman & Susan, Dec 2001).

The extracted information/knowledge obtained by applying BI tools must be stored in knowledge repository for future use and sharing within the organization.

Key Levers Of Knowledge Strategy

Companies are increasingly recognizing the contribution of knowledge in the form of intellectual capital or knowledge base of the firm and the value that can be derived from it. Skyrme (2002) has given seven key levers that will have the maximum impact on a knowledge-based strategy.  Briefly the seven levers are:

·         Customer knowledge: repeatedly cited in surveys as the most important knowledge an organization needs to capture and exploit; Business Intelligence (BI) helps in retaining existing and identifying potential customers and suppliers. 

·         Knowledge-enhanced products and services:  adding value by surrounding the product with additional information, such as personal preferences when booking travel; BI helps to add value by using the captured knowledge and collaboration. 

·         Knowledge in people:  people-focused programs aim to continually improve workforce skills through development: BI helps in skills development through collaboration technologies and eLearning.

·         Organizational memory: knowing what an organization knows, over space and time e.g. sharing best practice or recording lessons learned: BI through collaboration technology helps in sharing practices.

·         Knowledge in processes:  capturing the knowledge of the best professionals and embedding their good practices into the recommended procedures: BI helps to capture implicit knowledge from experts.

·         Knowledge in relationships: creating forums and other mechanisms to have intimate sharing of knowledge with suppliers, customers and partners; sharing of knowledge is a common practice BI

·         Knowledge assets the intellectual capital focus; BI not only focuses on the intellectual capital of company but also assists to perceiving corporate competencies of competitors. 

Benefits

The major benefit of BI with knowledge discovery tools is the ability to provide accurate information when needed, including real-time view of corporate performance. Thomson (2004) on the basis of survey reported the following major benefits of BI:

·         Faster, more accurate reporting (81 percent)

·         Improved decision making(78 percent)

·         Improved customer services (56 percent)

·         Increased revenue (49 percent)

·         Many of the benefits of BI are intangible. That’s why, according to Eckerson (2003), so many executives do not insist on rigorous cost justification of BI Projects.

Conclusion

The continuous shift towards a knowledge-based economy has brought to the fore the issue of how knowledge is created, assimilated and used to obtain economic returns. Knowledge embodied in intellectual assets (e.g. human capital, R&D, patents, software, documents etc.) is becoming essential for organizations economic performance and growth. In this new environment, companies need to be able to earn economic returns from both developing and using intellectual assets. KDD is a new generation of computational techniques and tools that support the extraction of useful knowledge from the rapidly growing volumes of data.  The organizations are now realizing the potential payoffs of KDD applications along with BI. By bringing together a set of diverse fields, KDD along with BI creates fertile ground for the development of new tools and techniques for managing, analyzing, and creating value from the flood of data facing modern business world. It is the responsibility of researchers, academicians and practitioners in this field to ensure that users understand the potential contributions of KDD and BI for creating value and gaining competitive advantage. 

References

Adriaans, P., & Zantinge, D. (1996). Data Mining. Harrow, England: Addison-Wesley.

Agrawal, R., Mannila, H., Srikant, R., Toivonen, H., & Verkamo, I. (1996). Fast discovery of association rules. In U. Fayyad, Piatetsky-Shapiro, P. Smyth, & R. Uthurusamy, In Advances in Knowledge Discovery and Data Mining. London: AAAI/MIT Press.

Alavi, M. (2000). Managing Organizational Knowledge. In R. ZMUD, In Framing the Domain of IT Management. Pinaflex Educational Resources.

Brachman, R., & Anand, T. (1996). The process of knowledge discovery in databases: a human-centered approach. In Advances in Knowledge Discovery and Data Mining (pp. 37–58). AAAI Press.

Cui, Z., Damiani, E., & Leida, M. (2007). Benefits of Ontologies in Real Time Data Access. Digital Ecosystems and Technologies Conference, DEST '07.pp. 392-397.

Dhar, V., & Stein, R. (1997). Intelligent decision support methods. Upper Saddle River, NJ: Prentice Hall.

Eckerson, W. (2003). smart Companies in 21st Century: The Secrets of Creating sucessful BI solutions. Seatle: The Data Warehousing Institute.

Fayyad, U. M. (1996). Data Mining and Knowledge Discovery: Making Sense Out of Data. IEEE Expert(11(5)), 20–25.

Fischler, M. A., & Firschein, O. (1987). Intelligence: The Eye, The Brain anf The Computer. Addison-Wesley.

Geist, I. (2002). A framework for data mining and KDD. 2002 ACM symposium on applied computing. Madrid, Spain.

Godin, B. (2006). The Knowledge-Based Economy:Conceptual Framework or Buzzword? The Journal of Technology Transfer, 17-30.

Golfarelli, M., Rizzi, S., & Cella, I. (2004). Beyond data warehousing: what’s next in business intelligence? DOLAP ’04: Proceedings of the 7th ACM international workshop on Data warehousing and OLAP. New York.

Han, J., & Kamber, M. (2006). Data Mining: Concepts and Techniques, Second Edition. Morgan Kaufmann.

Hoffer, J., Prescort, M., & McFadden, F. (2002). Modern Database Management 6th ed. Prentice Hall.

Imberman, & Susan, P. (Dec 2001). Effective Use Of The KDD Process And Data Mining For Computer Performance Professionals. Proceedings of CMG 2001.

Jonathan, W. (2000). Business Intelligence: What is Business Intelligence? DM Review.

Kankanhalli, A., & Tan, B. C. (2005). Knowledge Management Metrics: A Review and Directions for Future Research. International Journal of Knowledge Management vol, 1(2), 20-32.

Khan, R. A., & Quadri, S. M. (2012). Business Intelligence: An Integrated Approach. The Business Intelligence Journal (BIJ), 5(1), 64-70.

Lazcorreta, E., Botella, F., & Fernández-Caballero, A. (2008). Towards Personalised Recommendation by Two-Step Modified Apriori Data Mining Algorithm. Expert Systems with Applications, 35(3), 1422-1429.

Moss, L., & Atre, S. (2003). Business Intelligence Roadmap: The Complete Lifecycle for Decision-Support Applications. Boston: Addison-Wesley.

Piatetsky-Shapiro, G., & Frawley, W. (1991). Knowledge Discovery in Databases. Menlo Park, Calif: AAAI Press.

Pirttimäki, V. (2004). The Roles of Internal and External Information in Business Intelligence. Frontiers of E-Business Research.

Polanyi, M. (1958). Personal Knowledge: Towards a Post-Critical Philosophy. Chicago: University of Chicago Press.

Ranjan, J. (2008). Business justification with business intelligence. The Journal of Information and Knowledge Management Systems, 38(4), 461-475.

Reinschmidt, J., & Francoise, A. (2002). Business Intelligence Certification Guide. IBM, International Technical Support Organization.

Roiger, R. J., & Geatz, M. W. (2003). Data Mining a Tutorial Based Primer. San Francisco: Addison-Wesley.

Skyrme, D. J. (2002). Business value from knowledge management. Conference Mobilising Knowledge for Business Performance. London.

Thomson, O. (2004, Oct). siness Intelligence Success, Lessons learned. Retrieved from www.technologyevaluation.com

Tseng, C. Y., & James Goo, Y. J. (2005). Intellectual capital and corporate value in an emerging economy: empirical study of Taiwanese manufacturers. R&D Management, 187–201.

Tuggle, F. D., & Goldfinger, W. E. (2004). A Methodology for Mining Embedded Knowledge from Process Maps. Human Systems Management, 23(1).

Turban, E., & Aronson, J. E. (2001). Decision Support Systems & Intelligent Systems, 2nd ed. India: Pearson Education Inc .

Watson , H. J., & Wixom, B. (2007). The Current State of Business Intelligence. IEEE Computer, 40, pp. 96-99.

Witten, I., & Frank, E. (2005). Data Mining: Practical Machine Learning Tools and Techniques. San Francisco: Morgan Kaufmann.

Wu, L., Barash, G., & Bartolini, C. (2007). A Service-oriented Architecture for Business Intelligence. Service-Oriented Computing and Applications SOCA '07, IEEE International Conference. 279-285.

Wu, X. (2004). Data Mining: Artificial Intelligence in Data Analysis. Proceedings of IEEE/WIC/ACM International Conference on Intelligent Agent Technology.

Zeng, L., Xu, L., Shi, Z., Wang, M., & Wu, W. (October 8-11, 2006). Techniques, process, and enterprise solutions of business intelligence. IEEE Conference on Systems, Man, and cybernetics, 6, p. 4722. Taipei, Taiwan.



Contact the Author:

Rafi Ahmad Khan, The Business School, University of Kashmir, Srinagar, J&K, India; Email: mca_rafi@yahoo.com