Journal of Knowledge Management Practice, Vol.
11, Special Issue 1, January 2010
International Conference On Innovation In Redefining
Business Horizons
ABSTRACT
In last decade, the higher education in
Keywords: Data mining, Higher
education, Clustering, Decision tree, Neural network, Genetic algorithm
1. Introduction
Higher education is learning that is provided by universities, vocational universities, degree colleges, arts colleges, technical and medical colleges, and other institutions that award academic degrees. Higher education is normally taken to include undergraduate and postgraduate education, as well as vocational education and training. Colleges and universities are the main institutions that provide higher education. Higher education includes teaching, research and social services activities of universities, and within the realm of teaching, it includes both the undergraduate level and postgraduate level. Higher education is very important to national economies, both as a significant industry in its own right, and as a source of trained and educated personnel for the rest of the economy.
Higher general education might be contrasted with higher vocational education, which concentrates on both practice and theory. A university is an institution of higher education and research, which grants academic degrees; including Bachelor's degrees, Master's degrees and doctorates in a variety of subjects. However, most professional education is included within higher education, and many postgraduate qualifications are strongly professionally oriented, for example in disciplines such as social work, law and medicine.
Ø Luan, (2002) described that higher education institutions carry three duties that are data mining intensive. They are:
Ø Scientific research that relates to the creation of knowledge
Ø Teaching that concerns with the transmission of knowledge
Ø
Institutional research that
pertains to the use of knowledge for decision making.
Wikipedia (2008) defines Data mining as the process of sorting through large amounts of data and picking out relevant information. It is frequently used by business organizations, and financial analysts, but is increasingly being used in the sciences to extract information from the huge data sets generated by modern experimental and observational methods.
Gartner Group (2007) defines data mining as “the process of discovering meaningful new correlation, patterns and trends by shifting through large amount of data stored in repositories and by using pattern recognition technologies as well as statistical and mathematical techniques”. Rubenking (2001) explains, “Data mining is the process of automatically extracting useful information and relationships from immense quantities of data. In its purest form, data mining doesn’t involve looking for specific information. Rather than starting from a question or hypothesis, data mining simply finds patterns that are already present in the data”. Han and Kamber (2006) define data mining as the process of discovering ‘hidden images’, patterns and knowledge within large amount of data and making predictions for outcomes or behaviors. Data mining methods can help bridge the knowledge gaps in higher educational system.
Traditionally, business analysts have performed the task of extracting useful information from recorded data, but the growing volume of data in modern business and science calls for computer-based approaches. As data sets have grown in size and complexity, there has been a shift away from direct hands-on data analysis toward indirect, automatic data analysis using more complex and sophisticated tools. Data mining identifies trends within data that go beyond simple analysis. Through the use of sophisticated algorithms, non-statistician users have the opportunity to identify key attributes of business processes and target opportunities.
The term data mining is often used to apply to the two separate processes of knowledge discovery and prediction. Knowledge discovery provides explicit information that has a readable form and can be understood by a user. Forecasting, or predictive modeling provides predictions of future events and may be transparent and readable in some approaches and opaque in others such as neural networks. Data mining relies on the use of real world data. These data are extremely vulnerable to co-linearity because data from the real world may have unknown interrelations. Data mining is the entire process of applying computer-based methodology, including new techniques for knowledge discovery, to data.
This paper presents how various data mining techniques can be suitably
applied in the field of higher education to discover some meaningful patterns
or relations that can further improve the overall
performance of higher education in
2. Motivation
And Related Research
The data mining application in the area of education is wide spread. The researchers have explored various applications of data mining in education. The authors had gone through the survey of the literature to understand the importance of data mining in higher education. The research papers mostly concentrated on the data mining application from domain perspective. We had tried to analyze its importance from Indian higher education perspective which has not been explored as much. This is the motivation for our paper.
Table 1: Describes Various
Research Work Done Related To The Use Of Data Mining In The Context Of Higher
Education.
S. No. |
Author |
Year |
Work |
1 |
Ma et al |
2000 |
Presented a real life application of data mining to find weak students |
2 |
Luan J. |
2001 |
Introduced a powerful decision support tool, data mining, in the context of knowledge management. |
3 |
Luan J. |
2002 |
Discussed the potential applications of data mining in higher education & explained how data mining saves resources while maximizing efficiency in academics. |
4 |
Delavari et al |
2005 |
Proposed a model for the application of data mining in higher education. |
5 |
Shyamala, K. & Rajagopalan, S. P. |
2006 |
Developed a model to find similar patterns from the data gathered and to make predication about students’ performance. |
6 |
Sargenti et al |
2006 |
Explored the development of a model which allows for diffusion of knowledge within a small business university |
7 |
Ranjan, J. |
2008 |
Examined the effect of information technology in academic institutions for sharing information |
Luan (2002) studied the impact of data mining on higher education. This study helped to gain insights about the existing higher education worldwide and its improvement from data mining perspective. Delavari et al (2004) discussed a new model for using data mining in higher educational system. Waiyamai (2003) suggested that the use of data mining in education can help improve the quality of graduate students. Barros and Verdejo (2000) analyzed the student interaction process and applied to improve collaboration. Delmater and Handcock (2001) place stress on underlying predictive modeling which is a mixture of mathematics, computer science and domain expertise.
Ranjan and Malik (2007) proposed a framework for effective educational process using data mining techniques to uncover the hidden trends and patterns and making accuracy based predictions through higher level of analytical sophistication in students counseling process. Talavera and Gaudioso (2004) proposed to shape the analysis problem as data mining task. The author suggested that the typical data mining cycle bears many resemblances with proposed models for collaboration management and presented some preliminary experiments using clustering to discover patterns reflecting user behaviors.
Ma et al (2000) visualized that the education domain offers many interesting and challenging applications for data mining. First, an educational institution often has many diverse and varied sources of information. There are the traditional databases (e.g. students’ information, teachers’ information, class and schedule information, alumni information), online information (online web pages and course content pages) and more recently, multimedia databases. Second, there are many diverse interest groups in the educational domain that give rise to many interesting mining requirements. For example, the administrators may wish to find out information such as the admission requirements and to predict the class enrollment size for timetabling. The students may wish to know how best to select courses based on prediction of how well they will perform in the courses selected. The alumni office may need to know how best to perform target mailing so as to achieve the best effort in reaching out to those alumni that are likely to respond. All these applications not only contribute an educational institute in delivering a better quality education experience, but also aid the institution in running its administrative tasks effectively. With so much information and so many diverse needs, it is foreseeable that an integrated data mining system that is able to cater to the special needs of an educational institution will be in great demand.
The literature survey had enabled us to study various papers which made significant impact on our findings from Indian perspective.
3. Research
Methodology
The research methodology adapted is based on the in-depth study of the topic
pertaining to the data mining and its application in higher education. The
literature review carried out helped us to understand the growing importance of
the use of data mining techniques in the field of higher education. The views
of various national and international conferences were taken into consideration
while analyzing the data mining applications in the field of higher
education. The talks with various
academicians, institutions, colleges offering higher education and experts in
the field of data mining helped us to find and present the techniques, process
and application of data mining in higher education in
4. Data
Mining - An introduction
Data mining is the extraction of hidden information from the huge volume of data. The current business world is utilizing the data mining for gaining the insight into business strategies. There are no areas which are not affected by data mining. The most profit has been achieved by the service sector industries like banking enterprises. The growing volume of data in the higher education has promoted some of the researchers to talk of inclusion of data mining in higher education also. The data mining is able to perform better if the volume of data is large. Data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data (Frawely et al, 1992). It is the extraction of information from huge volume of data or set through the use of various data mining techniques (Feelders et al, 2000).
The data mining techniques like clustering, classification, neural network, genetic algorithms help in finding the hidden and previously unknown information from the database. The clustering technique of data mining helps to segment the data according to the characteristic of the particular segment. This is helpful for detecting the loyal customer in the business world. The classification techniques of data mining help to classify the data on the basis of certain rules. This helps to frame policies for the future. The genetic algorithms help to find the best out of the given data. The data mining tools in the market provides an effective graphical user interface which helps the users to easily understand and analyze the data for strategic decision making.
4.1. Clustering
Clustering is a technique by which similar records are grouped collectively. Usually this is done to give the end user a high level view of what is going on in the database. Clustering is sometimes used to denote segmentation - which most marketing people tell, is useful for coming up with a bird’s eye vision of the business. Berson et al (1999) cites that Claritas Corporation and Equifax Corporation have grouped the people by demographic information into segments that they consider are useful for direct marketing and sales. To build these groupings they use information such as profits, age, job, housing and race collected in the US Census. Then they assign impressive “nicknames” to the clusters. This clustering information is then used by the end user to label the customers in their database. Once this is done the business user can get a quick high level scrutiny of what is happening within the cluster. Once the business user has worked with these codes for some time they also begin to build intuitions about how these different customers clusters will respond to the marketing offers particular to their business. For instance some of these clusters may relate to their business and some of them may not. But given that their competition may well be using these same clusters to structure their business and marketing offers it is important to be aware of how ones customer base behaves in regard to these clusters.
4.2. Decision
Trees
A decision tree is a foretelling model that, as its name implies, can be viewed as a tree. Specifically each branch of the tree is a classification question and the leaves of the tree are partitions of the dataset with their categorization. It divides up the data on each branch point in such a way that the number of total records in a given parent node is equal to the sum of the records contained in its two children. From a business perspective decision trees can be viewed as creating a segmentation of the original dataset (each segment would be one of the leaves of the tree). Segmentation of customers, products, and sales regions is something that advertising managers have been doing for many years. In the past this segmentation has been performed in order to get a high level view of a huge amount of data - with no particular reason for creating the segmentation except that the records within each segment were somewhat similar to each other. Because decision trees score so well on so many of the significant features of data mining they can be used in a variety of academic problems for both exploration and prediction. Usually the models to be built and the interactions to be detected are much more complex in real world problems and this is where decision trees excel.
5. Scenario
Of Higher Education In
The educational arrangement in
For a long time the
principal employer of the educated youth in the country was the State. State as
an employer generally did not look beyond academic certifications such as
graduate, post-graduate, and occasionally at the level at which the degrees
were obtained. It is only in the recent past requirement such as National
Eligibility Test coupled with the level of pass at the post-graduate level were
introduced as essential criteria for appointment of lecturers in colleges and
universities.
The Indian economy is of
mixed type and it provides equal opportunities to public as well as private
sector. The hard reality is that majority of employment opportunities now are
in the private sector and this sector is very choosy, to say the least. The
private sector employs only those who possess skills and competencies required
by it. Its requirements are continually changing because this sector has to
keep pace with its global competitors. Education now has to be tailor-made to
the requirements of the private sector. Also, institutions of international
reputation are making inroads in the higher education sector by providing
alternative learning opportunities leading to award of degrees of their
universities. At the same time, the traditional structure of higher education
in the state funded institutions has continued to remain around teacher-student
contact, and finds that it is no longer adequate in meeting its demand and
relevance. There are not enough lecturers, library books or rooms, and there is
not enough time. New organizational structures are therefore required to
support new learning processes.
In such a scenario, an
obvious conclusion is that the road of higher education seems to have reached
dead-end, or in other words is at crossroads. The need of the hour is to give a
fresh look to the higher education and introduce such changes as will restore
confidence in the ability of the state universities and colleges for providing,
cost effective, education relevant to the present context of the world of work.
To sustain in such competitive world the use of data mining can be helpful.
6. Application
Of Data Mining In Higher Education From Indian Perspective
There are numerous areas in which data mining can be applied. For example:
Ø Performance management: Data mining techniques can be utilized efficiently in selecting course, managing students’ performance, improving attendance (or dropouts), providing supplementary classes where necessary, allocating instructors in a better managed way and thus improving overall stature of the institute / University. A data mining model can monitor each student’s progress by capturing the variables such as previous semester grade, test mark, assignment grade and attendance. The students’ performance can also be analyzed based on the features of interpersonal peer groups such as intellectual self confidence, scoring pattern and time spent with peer groups. The model can also identify the students who are likely to drop out and action can be taken by providing appropriate counseling in a timely manner. The model can find similar patterns from the data gathered and predictions can be made about student’s performance. Students with good assignment grades tend to score good results in the examination. It can help teachers to identify the students at risk of failure and provide additional support in form of extra classes, academic counseling in order to make them perform better in semester grade.
Ø Grants and funds management: Institutions can compete for and manage a variety of department and sponsored grant programs, research awards and alumni funding by using data mining techniques. Alumni can be classified on the basis of various parameters. An association can be found out among some parameters to predict the likelihood of funding pattern. For example it can be deduced that those alumni who maintained good rapport with the faculty members during college days and who are better placed now, may be a potential candidate.
Ø Student life-cycle management: All aspects of the student life cycle e.g. academic cycle can be managed and student experience can be enhanced by using various techniques of data mining as whole life cycle is improved due to increase in efficiency of each department. Data mining can be used to find the influence of friendship groups on education of students. Also, the attendance marks plays an important role as internal mark. The attendance may be classified as regular or irregular based on the attendance percentage. It can be relaxed up to certain percentage with a penalty fees. The attendance may be associated with test mark and assignment grade to predict semester result.
Ø Procurement: By using data mining techniques, costs can be minimized, accurate spend analysis can be conducted and supplier relationships can be enhanced. Increasing electronic data storage and transmission has brought more opportunities for investigators to track and retrieve that data. An e-mail message, for example, leaves a detectable ‘footprint’ on any server through which it passes. In today’s business environment, data mining techniques are most often employed as the solution to procurement fraud because of the increasing amount of available data that can be manipulated and examined, be it the supplier master file, invoice history file, or even access control data. Data mining techniques can also be utilized in finding which inquiries are most likely to turn into actual admission.
Data mining techniques can predict enrollment to specific courses to help determine a program’s success rate (as well as failure). At Birla Institute of Technology and Science, Pilani, for example, students in a graduate course on data mining were asked to wade through years of raw data on incoming students and pick out factors that linked to retention using analytics software from SAS Institute Inc., a company that helped design the course. They found that freshmen who lived off the campus were more likely to drop out. University officials took the findings seriously and adopted a few policy changes as a result. For instance, the university began requiring first-year students to live on the campus.
Data mining can help in finding out retention rate of faculty members. raw data of faculty members can be analyzed on various parameters such as salary, allowances, medical support, group insurance, transport facilities, gratuity etc. Some other parameters may also be important such as academic load, indulgence of management people in day to day affair, distance from their residence and many other variables. Data mining can help in finding out what are the main clusters found in student/faculty satisfaction surveys? Such surveys can help improve canteen facilities, overall cleanliness, staff behavior etc.
Table 2: Describes Various Data
Mining Techniques That Can Be Used To Find Certain Patterns
Major Data Mining Techniques |
Patterns |
Clustering |
Students
having similar characteristics Grouping
top performers Groups
of students most likely to drop |
Classification
& Prediction |
Predicting
students learning outcome in an institute Predicting
the percentage accuracy in students’ performance Classifying
the admission process Prediction
of what type of students most likely to drop Predicting
students’ behavior, attitude Predicting
the performance progress throughout the semester Identifying
the best profile for different students Prediction
to find what factors will attract meritorious students Scores
of students in risk category predicted to voluntarily leave |
Association |
Association
of training undertaken with various types of students and performance scores,
individually and in teams. Association
of students’ work profiles to the most appropriate project Association
of students’ team building and leadership approaches. Association
of students’ attitude with performance.
|
Data
Mining using other inter-disciplinary methods |
Standardizing
teaching methods, performance monitoring in career management Use
historical data to build models of students’ indecent behavior and use
data mining to help identify similar instances. |
All these examples and literature suggest that data mining techniques can be
used in the field of higher education in
7. Conclusion
Among several innovation in
recent technology, data mining is making comprehensive changes in the field of
higher education. It has tremendous applications in higher educational
institution in
8. References-
Barros, B. and Verdejo, M. F. (2000), ‘Analyzing Student Interaction Processes In Order To Improve Collaboration: The Degree Approach’, International Journal of Artificial Intelligence in Education, Vol-11, pp221-241.
Berson, A., Smith, S. and Thearling, K. (1999), ‘Building
Data Mining Applications for CRM’, McGraw-Hill
Professional.
‘Data Mining’ (2008) retrieved on 12 Sep 2008, from http://en.wikipedia.org/wiki/Data_mining.
Delavari N, Beikzadeh M. R, Shirazi M. R. A. (2004), ‘A New Model for Using Data Mining in Higher Educational System”, Proceedings of 5th International Conference on Information Technology based Higher Education and Training: ITEHT ’04, Istanbul, Turkey.
Delavari, N., Beikzadeh, M.R. and Amnuaisuk, S. K. (2005),
‘Application of Enhanced Analysis Model for Data Mining Processes in
Higher Educational System’, Proceedings
of ITHET 6th Annual International
Conference, Juan Dolio,
Delmater R., and Handcock M. (2001), ‘Data Mining Explained: A
Manager’s Guide to Customer- Centric Business Intelligence’, Digital Press,
Feelders, A., Daniels, H. and Holsheimer, M. (2000) ‘Methodological and Practical Aspects of Data Mining’, Information and Management, Vol.37, No.5, pp.271-281.
Frawley, W., Piatetsky-Shapiro, G., and Matheus, C. (1992). ‘Knowledge Discovery in Databases: An Overview’, AI Magazine, Fall 1992, pp. 213-228.
Gartner Group (2007). ‘Magic Quadrant for Customer Data Mining’. Retrieved 18 September 2008 from http://dml.cs.byu.edu/~cgc/docs/mldm_tools/Reading/Gartner2Q07.pdf.
Han, J. and Kamber M. (2006),
Data Mining
Concepts and Techniques, Morgan Kaufmann,
pp.4-27.
Luan, J. (2001), ‘Data Mining Applications in Higher Education’, A chapter in the upcoming New Directions
for Institutional Research, 1st
Ed., Josse-Bass,
Luan, J. (2002), ‘Data Mining and Its Applications in Higher
Education’ in A. Serban and J. Luan (eds.) Knowledge Management:
Building a Competitive Advantage fir Higher Education. New Directions for
Institutional Research, No.
113.
Ma, Y., Liu, B., Wong, C. K., Yu, P. S., Lee, S. M. (2000), ‘Targeting the right students using data mining’, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, pp 457-464.
Ranjan, J. and Malik, K. (2007), ‘Effective Educational Process: A Data Mining Approach’, VINE, Vol. 37, Issue 4, pp 502-515.
Ranjan, J. (2008), ‘Impact of Information Technology in Academia’, International Journal of Education Management, Vol. 22, Issue 5, pp 442-455.
Rubenking, N. (2001),
‘Hidden Messages’, PC Magazine, May 22, 2001. Retrieved 25
October 2008 from : http://www.pcmag.com/article2/0,2817,8637,00.asp.
Sargenti, P., Lightfoot, W. and Kehal, M. (2006), ‘Diffusion of Knowledge in and through Higher Education Organizations’, Issues in Information Systems, Vol 3, No. 2, pp 3-8.
Shyamala K. and Rajagopalan S. P. (2006), ‘Data Mining Model for a better Higher Educational System’, Information Technology Journal, Vol. 5, No. 3, pp 560-564.
Talavera, L. and Gaudioso, E. (2004), ‘Mining Student Data to Characterize
Similar Behavior Groups In Unstructured Collaboration Spaces’, presented at Workshop on Artificial
Intelligence in Computer Supported Collaborative Learning at European
Conference on Artificial Intelligence, Valencia, Spain,pp 17-23.
Waiyamai K. (2003), ‘Improving Quality of Graduate Students by Data Mining’, Dept. of Computer Engineering, Faculty of Engineering, Kasetsart University, Bangkok, Thailand.
Contact the Authors:
Dr. Jayanthi Ranjan, Information Management Area,
Institute of Management Technology, Raj Nagar, Ghaziabad, Uttar Pradesh, India;
Mobile: 09811443110; Email: jranjan@imt.edu
Raju Ranjan, Department of Information Technology, Ideal Institute of
Technology, Govindpuram, Ghaziabad, Uttar Pradesh, India; Mobile: 09868215863;
Email: ranjanr.cs@gmail.com