Journal of Knowledge Management Practice,

Journal of Knowledge Management Practice, Vol. 11, Special Issue 1, January 2010

Papers Selected From

International Conference On Innovation In Redefining Business Horizons

Institute of Management Technology, Ghaziabad, India, 18 - 19 December, 2008

Application Of Data Mining Techniques In Higher Education In India

Jayanthi Ranjan ¹, Raju Ranjan ²

¹ Institute of Management Technology, ² Ideal Institute of Technology, India

ABSTRACT

In last decade, the higher education in India has grown manifolds. Private participation in establishing new institutions, encouraged by the government forced the higher education to revisit their scope and objectives in the long run to sustain. The regulatory bodies, though, have framed guidelines for various infrastructures, faculty and other resources but in many cases this has been grossly violated leading to inferior education further culminating into un-employability of the students. The paper aims to purpose the use of data mining techniques to improve the efficiency of higher educational institutions. If data mining techniques such as clustering, decision tree, association can be applied to higher education processes, it can help improve students’ performance, their life cycle management, selection of course and major, their retention rate and grant/fund management of an institution. This is the first approach to examine the effect of using data mining techniques in higher educational institutions in Indian perspective.

Keywords: Data mining, Higher education, Clustering, Decision tree, Neural network, Genetic algorithm

1. Introduction

Higher education is learning that is provided by universities, vocational universities, degree colleges, arts colleges, technical and medical colleges, and other institutions that award academic degrees. Higher education is normally taken to include undergraduate and postgraduate education, as well as vocational education and training. Colleges and universities are the main institutions that provide higher education. Higher education includes teaching, research and social services activities of universities, and within the realm of teaching, it includes both the undergraduate level and postgraduate level. Higher education is very important to national economies, both as a significant industry in its own right, and as a source of trained and educated personnel for the rest of the economy.

Higher general education might be contrasted with higher vocational education, which concentrates on both practice and theory. A university is an institution of higher education and research, which grants academic degrees; including Bachelor's degrees, Master's degrees and doctorates in a variety of subjects. However, most professional education is included within higher education, and many postgraduate qualifications are strongly professionally oriented, for example in disciplines such as social work, law and medicine.

Ø Luan, (2002) described that higher education institutions carry three duties that are data mining intensive. They are:

Ø Scientific research that relates to the creation of knowledge

Ø Teaching that concerns with the transmission of knowledge

Ø Institutional research that pertains to the use of knowledge for decision making.

Wikipedia (2008) defines Data mining as the process of sorting through large amounts of data and picking out relevant information. It is frequently used by business organizations, and financial analysts, but is increasingly being used in the sciences to extract information from the huge data sets generated by modern experimental and observational methods.

Gartner Group (2007) defines data mining as “the process of discovering meaningful new correlation, patterns and trends by shifting through large amount of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques”. Rubenking (2001) explains, “Data mining is the process of automatically extracting useful information and relationships from immense quantities of data. In its purest form, data mining doesn’t involve looking for specific information. Rather than starting from a question or hypothesis, data mining simply finds patterns that are already present in the data”. Han and Kamber (2006) define data mining as the process of discovering ‘hidden images’, patterns and knowledge within large amount of data and making predictions for outcomes or behaviors. Data mining methods can help bridge the knowledge gaps in higher educational system.

Traditionally, business analysts have performed the task of extracting useful information from recorded data, but the growing volume of data in modern business and science calls for computer-based approaches. As data sets have grown in size and complexity, there has been a shift away from direct hands-on data analysis toward indirect, automatic data analysis using more complex and sophisticated tools. Data mining identifies trends within data that go beyond simple analysis. Through the use of sophisticated algorithms, non-statistician users have the opportunity to identify key attributes of business processes and target opportunities.

The term data mining is often used to apply to the two separate processes of knowledge discovery and prediction. Knowledge discovery provides explicit information that has a readable form and can be understood by a user. Forecasting, or predictive modeling provides predictions of future events and may be transparent and readable in some approaches and opaque in others such as neural networks. Data mining relies on the use of real world data. These data are extremely vulnerable to co-linearity because data from the real world may have unknown interrelations. Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, to data.

This paper presents how various data mining techniques can be suitably applied in the field of higher education to discover some meaningful patterns or relations that can further improve the overall performance of higher education in India.

2. Motivation And Related Research

The data mining application in the area of education is wide spread. The researchers have explored various applications of data mining in education. The authors had gone through the survey of the literature to understand the importance of data mining in higher education. The research papers mostly concentrated on the data mining application from domain perspective. We had tried to analyze its importance from Indian higher education perspective which has not been explored as much. This is the motivation for our paper.

Table 1: Describes Various Research Work Done Related To The Use Of Data Mining In The Context Of Higher Education.

S. No.	Author	Year	Work
1	Ma et al	2000	Presented a real life application of data mining to find weak students
2	Luan J.	2001	Introduced a powerful decision support tool, data mining, in the context of knowledge management.
3	Luan J.	2002	Discussed the potential applications of data mining in higher education & explained how data mining saves resources while maximizing efficiency in academics.
4	Delavari et al	2005	Proposed a model for the application of data mining in higher education.
5	Shyamala, K. & Rajagopalan, S. P.	2006	Developed a model to find similar patterns from the data gathered and to make predication about students’ performance.
6	Sargenti et al	2006	Explored the development of a model which allows for diffusion of knowledge within a small business university
7	Ranjan, J.	2008	Examined the effect of information technology in academic institutions for sharing information

Luan (2002) studied the impact of data mining on higher education. This study helped to gain insights about the existing higher education worldwide and its improvement from data mining perspective. Delavari et al (2004) discussed a new model for using data mining in higher educational system. Waiyamai (2003) suggested that the use of data mining in education can help improve the quality of graduate students. Barros and Verdejo (2000) analyzed the student interaction process and applied to improve collaboration. Delmater and Handcock (2001) place stress on underlying predictive modeling which is a mixture of mathematics, computer science and domain expertise.

Ranjan and Malik (2007) proposed a framework for effective educational process using data mining techniques to uncover the hidden trends and patterns and making accuracy based predictions through higher level of analytical sophistication in students counseling process. Talavera and Gaudioso (2004) proposed to shape the analysis problem as data mining task. The author suggested that the typical data mining cycle bears many resemblances with proposed models for collaboration management and presented some preliminary experiments using clustering to discover patterns reflecting user behaviors.

Ma et al (2000) visualized that the education domain offers many interesting and challenging applications for data mining. First, an educational institution often has many diverse and varied sources of information. There are the traditional databases (e.g. students’ information, teachers’ information, class and schedule information, alumni information), online information (online web pages and course content pages) and more recently, multimedia databases. Second, there are many diverse interest groups in the educational domain that give rise to many interesting mining requirements. For example, the administrators may wish to find out information such as the admission requirements and to predict the class enrollment size for timetabling. The students may wish to know how best to select courses based on prediction of how well they will perform in the courses selected. The alumni office may need to know how best to perform target mailing so as to achieve the best effort in reaching out to those alumni that are likely to respond. All these applications not only contribute an educational institute in delivering a better quality education experience, but also aid the institution in running its administrative tasks effectively. With so much information and so many diverse needs, it is foreseeable that an integrated data mining system that is able to cater to the special needs of an educational institution will be in great demand.

The literature survey had enabled us to study various papers which made significant impact on our findings from Indian perspective.

3. Research Methodology

The research methodology adapted is based on the in-depth study of the topic pertaining to the data mining and its application in higher education. The literature review carried out helped us to understand the growing importance of the use of data mining techniques in the field of higher education. The views of various national and international conferences were taken into consideration while analyzing the data mining applications in the field of higher education. The talks with various academicians, institutions, colleges offering higher education and experts in the field of data mining helped us to find and present the techniques, process and application of data mining in higher education in India.

4. Data Mining - An introduction

Data mining is the extraction of hidden information from the huge volume of data. The current business world is utilizing the data mining for gaining the insight into business strategies. There are no areas which are not affected by data mining. The most profit has been achieved by the service sector industries like banking enterprises. The growing volume of data in the higher education has promoted some of the researchers to talk of inclusion of data mining in higher education also. The data mining is able to perform better if the volume of data is large. Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data (Frawely et al, 1992). It is the extraction of information from huge volume of data or set through the use of various data mining techniques (Feelders et al, 2000).

The data mining techniques like clustering, classification, neural network, genetic algorithms help in finding the hidden and previously unknown information from the database. The clustering technique of data mining helps to segment the data according to the characteristic of the particular segment. This is helpful for detecting the loyal customer in the business world. The classification techniques of data mining help to classify the data on the basis of certain rules. This helps to frame policies for the future. The genetic algorithms help to find the best out of the given data. The data mining tools in the market provides an effective graphical user interface which helps the users to easily understand and analyze the data for strategic decision making.

4.1. Clustering

Clustering is a technique by which similar records are grouped collectively. Usually this is done to give the end user a high level view of what is going on in the database. Clustering is sometimes used to denote segmentation - which most marketing people tell, is useful for coming up with a bird’s eye vision of the business. Berson et al (1999) cites that Claritas Corporation and Equifax Corporation have grouped the people by demographic information into segments that they consider are useful for direct marketing and sales. To build these groupings they use information such as profits, age, job, housing and race collected in the US Census. Then they assign impressive “nicknames” to the clusters. This clustering information is then used by the end user to label the customers in their database. Once this is done the business user can get a quick high level scrutiny of what is happening within the cluster. Once the business user has worked with these codes for some time they also begin to build intuitions about how these different customers clusters will respond to the marketing offers particular to their business. For instance some of these clusters may relate to their business and some of them may not. But given that their competition may well be using these same clusters to structure their business and marketing offers it is important to be aware of how ones customer base behaves in regard to these clusters.

4.2. Decision Trees

A decision tree is a foretelling model that, as its name implies, can be viewed as a tree. Specifically each branch of the tree is a classification question and the leaves of the tree are partitions of the dataset with their categorization. It divides up the data on each branch point in such a way that the number of total records in a given parent node is equal to the sum of the records contained in its two children. From a business perspective decision trees can be viewed as creating a segmentation of the original dataset (each segment would be one of the leaves of the tree). Segmentation of customers, products, and sales regions is something that advertising managers have been doing for many years. In the past this segmentation has been performed in order to get a high level view of a huge amount of data - with no particular reason for creating the segmentation except that the records within each segment were somewhat similar to each other. Because decision trees score so well on so many of the significant features of data mining they can be used in a variety of academic problems for both exploration and prediction. Usually the models to be built and the interactions to be detected are much more complex in real world problems and this is where decision trees excel.

5. Scenario Of Higher Education In India

The educational arrangement in India is generally referred as ‘Ten plus Two plus Three’ (10+2+3) pattern. The first ten years provide undifferentiated all-purpose education for all students. The +2 stage, also known as the higher secondary or senior secondary, provides for differentiation into academic and vocational streams and marks the end of school education. In +3 stage, which involves college education, the student goes for higher studies in his chosen field of subject (The duration in case of technical discipline may be four years also).

For a long time the principal employer of the educated youth in the country was the State. State as an employer generally did not look beyond academic certifications such as graduate, post-graduate, and occasionally at the level at which the degrees were obtained. It is only in the recent past requirement such as National Eligibility Test coupled with the level of pass at the post-graduate level were introduced as essential criteria for appointment of lecturers in colleges and universities.

The Indian economy is of mixed type and it provides equal opportunities to public as well as private sector. The hard reality is that majority of employment opportunities now are in the private sector and this sector is very choosy, to say the least. The private sector employs only those who possess skills and competencies required by it. Its requirements are continually changing because this sector has to keep pace with its global competitors. Education now has to be tailor-made to the requirements of the private sector. Also, institutions of international reputation are making inroads in the higher education sector by providing alternative learning opportunities leading to award of degrees of their universities. At the same time, the traditional structure of higher education in the state funded institutions has continued to remain around teacher-student contact, and finds that it is no longer adequate in meeting its demand and relevance. There are not enough lecturers, library books or rooms, and there is not enough time. New organizational structures are therefore required to support new learning processes.

In such a scenario, an obvious conclusion is that the road of higher education seems to have reached dead-end, or in other words is at crossroads. The need of the hour is to give a fresh look to the higher education and introduce such changes as will restore confidence in the ability of the state universities and colleges for providing, cost effective, education relevant to the present context of the world of work. To sustain in such competitive world the use of data mining can be helpful.

6. Application Of Data Mining In Higher Education From Indian Perspective

There are numerous areas in which data mining can be applied. For example:

Ø Performance management: Data mining techniques can be utilized efficiently in selecting course, managing students’ performance, improving attendance (or dropouts), providing supplementary classes where necessary, allocating instructors in a better managed way and thus improving overall stature of the institute / University. A data mining model can monitor each student’s progress by capturing the variables such as previous semester grade, test mark, assignment grade and attendance. The students’ performance can also be analyzed based on the features of interpersonal peer groups such as intellectual self confidence, scoring pattern and time spent with peer groups. The model can also identify the students who are likely to drop out and action can be taken by providing appropriate counseling in a timely manner. The model can find similar patterns from the data gathered and predictions can be made about student’s performance. Students with good assignment grades tend to score good results in the examination. It can help teachers to identify the students at risk of failure and provide additional support in form of extra classes, academic counseling in order to make them perform better in semester grade.

Ø Grants and funds management: Institutions can compete for and manage a variety of department and sponsored grant programs, research awards and alumni funding by using data mining techniques. Alumni can be classified on the basis of various parameters. An association can be found out among some parameters to predict the likelihood of funding pattern. For example it can be deduced that those alumni who maintained good rapport with the faculty members during college days and who are better placed now, may be a potential candidate.

Ø Student life-cycle management: All aspects of the student life cycle e.g. academic cycle can be managed and student experience can be enhanced by using various techniques of data mining as whole life cycle is improved due to increase in efficiency of each department. Data mining can be used to find the influence of friendship groups on education of students. Also, the attendance marks plays an important role as internal mark. The attendance may be classified as regular or irregular based on the attendance percentage. It can be relaxed up to certain percentage with a penalty fees. The attendance may be associated with test mark and assignment grade to predict semester result.

Ø Procurement: By using data mining techniques, costs can be minimized, accurate spend analysis can be conducted and supplier relationships can be enhanced. Increasing electronic data storage and transmission has brought more opportunities for investigators to track and retrieve that data. An e-mail message, for example, leaves a detectable ‘footprint’ on any server through which it passes. In today’s business environment, data mining techniques are most often employed as the solution to procurement fraud because of the increasing amount of available data that can be manipulated and examined, be it the supplier master file, invoice history file, or even access control data. Data mining techniques can also be utilized in finding which inquiries are most likely to turn into actual admission.

Data mining techniques can predict enrollment to specific courses to help determine a program’s success rate (as well as failure). At Birla Institute of Technology and Science, Pilani, for example, students in a graduate course on data mining were asked to wade through years of raw data on incoming students and pick out factors that linked to retention using analytics software from SAS Institute Inc., a company that helped design the course. They found that freshmen who lived off the campus were more likely to drop out. University officials took the findings seriously and adopted a few policy changes as a result. For instance, the university began requiring first-year students to live on the campus.

Data mining can help in finding out retention rate of faculty members. raw data of faculty members can be analyzed on various parameters such as salary, allowances, medical support, group insurance, transport facilities, gratuity etc. Some other parameters may also be important such as academic load, indulgence of management people in day to day affair, distance from their residence and many other variables. Data mining can help in finding out what are the main clusters found in student/faculty satisfaction surveys? Such surveys can help improve canteen facilities, overall cleanliness, staff behavior etc.

Table 2: Describes Various Data Mining Techniques That Can Be Used To Find Certain Patterns

Major Data Mining Techniques

Patterns

Clustering

Students having similar characteristics

Grouping top performers

Groups of students most likely to drop

Classification & Prediction

Predicting students learning outcome in an institute

Predicting the percentage accuracy in students’ performance

Classifying the admission process

Prediction of what type of students most likely to drop

Predicting students’ behavior, attitude

Predicting the performance progress throughout the semester

Identifying the best profile for different students

Prediction to find what factors will attract meritorious students

Scores of students in risk category predicted to voluntarily leave

Association

Association of training undertaken with various types of students and performance scores, individually and in teams.

Association of students’ work profiles to the most appropriate project

Association of students’ team building and leadership approaches.

Association of students’ attitude with performance.

Data Mining using other inter-disciplinary methods

Standardizing teaching methods, performance monitoring in career management

Use historical data to build models of students’ indecent behavior and use data mining to help identify similar instances.

All these examples and literature suggest that data mining techniques can be used in the field of higher education in India. It will not only help the educators and owners of the institution but will also cater the demand of efficient utilization of resources. The above discussion will enhance the scope of much needed trained manpower generation in developing India.

7. Conclusion

Among several innovation in recent technology, data mining is making comprehensive changes in the field of higher education. It has tremendous applications in higher educational institution in India. In this paper we have discussed that data mining can be useful in various aspects of an institute life cycle. As higher education in India is becoming important, the above discussion is aimed at increasing efficiency of a higher educational institute. This certainly translates into lower cost to the country as a whole. Such activities will definitely guide to better decision making procedures and will improve the quality of instructions.

8. References-

Barros, B. and Verdejo, M. F. (2000), ‘Analyzing Student Interaction Processes In Order To Improve Collaboration: The Degree Approach’, International Journal of Artificial Intelligence in Education, Vol-11, pp221-241.

Berson, A., Smith, S. and Thearling, K. (1999), ‘Building Data Mining Applications for CRM’, McGraw-Hill Professional.

‘Data Mining’ (2008) retrieved on 12 Sep 2008, from http://en.wikipedia.org/wiki/Data_mining.

Delavari N, Beikzadeh M. R, Shirazi M. R. A. (2004), ‘A New Model for Using Data Mining in Higher Educational System”, Proceedings of 5th International Conference on Information Technology based Higher Education and Training: ITEHT ’04, Istanbul, Turkey.

Delavari, N., Beikzadeh, M.R. and Amnuaisuk, S. K. (2005), ‘Application of Enhanced Analysis Model for Data Mining Processes in Higher Educational System’, Proceedings of ITHET 6th Annual International Conference, Juan Dolio, Dominican Republic.

Delmater R., and Handcock M. (2001), ‘Data Mining Explained: A Manager’s Guide to Customer- Centric Business Intelligence’, Digital Press, Boston.

Feelders, A., Daniels, H. and Holsheimer, M. (2000) ‘Methodological and Practical Aspects of Data Mining’, Information and Management, Vol.37, No.5, pp.271-281.

Frawley, W., Piatetsky-Shapiro, G., and Matheus, C. (1992). ‘Knowledge Discovery in Databases: An Overview’, AI Magazine, Fall 1992, pp. 213-228.

Gartner Group (2007). ‘Magic Quadrant for Customer Data Mining’. Retrieved 18 September 2008 from http://dml.cs.byu.edu/~cgc/docs/mldm_tools/Reading/Gartner2Q07.pdf.

Han, J. and Kamber M. (2006), Data Mining Concepts and Techniques, Morgan Kaufmann, pp.4-27.

Luan, J. (2001), ‘Data Mining Applications in Higher Education’, A chapter in the upcoming New Directions for Institutional Research, 1^st Ed., Josse-Bass, San Francisco.

Luan, J. (2002), ‘Data Mining and Its Applications in Higher Education’ in A. Serban and J. Luan (eds.) Knowledge Management: Building a Competitive Advantage fir Higher Education. New Directions for Institutional Research, No. 113. San Francisco, CA: Jossey Bass.

Ma, Y., Liu, B., Wong, C. K., Yu, P. S., Lee, S. M. (2000), ‘Targeting the right students using data mining’, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, pp 457-464.

Ranjan, J. and Malik, K. (2007), ‘Effective Educational Process: A Data Mining Approach’, VINE, Vol. 37, Issue 4, pp 502-515.

Ranjan, J. (2008), ‘Impact of Information Technology in Academia’, International Journal of Education Management, Vol. 22, Issue 5, pp 442-455.

Rubenking, N. (2001), ‘Hidden Messages’, PC Magazine, May 22, 2001. Retrieved 25 October 2008 from : http://www.pcmag.com/article2/0,2817,8637,00.asp.

Sargenti, P., Lightfoot, W. and Kehal, M. (2006), ‘Diffusion of Knowledge in and through Higher Education Organizations’, Issues in Information Systems, Vol 3, No. 2, pp 3-8.

Shyamala K. and Rajagopalan S. P. (2006), ‘Data Mining Model for a better Higher Educational System’, Information Technology Journal, Vol. 5, No. 3, pp 560-564.

Talavera, L. and Gaudioso, E. (2004), ‘Mining Student Data to Characterize Similar Behavior Groups In Unstructured Collaboration Spaces’, presented at Workshop on Artificial Intelligence in Computer Supported Collaborative Learning at European Conference on Artificial Intelligence, Valencia, Spain,pp 17-23.

Waiyamai K. (2003), ‘Improving Quality of Graduate Students by Data Mining’, Dept. of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand.

Contact the Authors:

Dr. Jayanthi Ranjan, Information Management Area, Institute of Management Technology, Raj Nagar, Ghaziabad, Uttar Pradesh, India; Mobile: 09811443110; Email: jranjan@imt.edu

Raju Ranjan, Department of Information Technology, Ideal Institute of Technology, Govindpuram, Ghaziabad, Uttar Pradesh, India; Mobile: 09868215863; Email: ranjanr.cs@gmail.com