Journal of Knowledge Management Practice,

Journal of Knowledge Management Practice, Vol. 8, SI 1, May 2007

Papers Selected From

Centre for Business Information, Organisation & Process Management (BIOPoM) 1^st International Conference 2006

University of Westminster, London UK, June 29, 2006

A Text Mining Approach For Automatic Taxonomy Generation And Text Categorisation

Nara Pais¹, Fefie Dotsika², James Shearer²

¹SPSS MR Latin America Inc., ²University of Westminster

ABSTRACT:

The research presented in this paper investigates the use of a text mining approach for automatic taxonomy generation and text categorisation for the content management system of Alergoclínica, a private clinic of Dermatology and Allergies in São Paulo, Brazil. Text mining has been of interest for many years, but despite the ever-increasing range of text mining applications available, there are neither common standards nor shared evaluation criteria to enable comparison among the different approaches. Numerous problems are addressed by various groups, often using private data sets, so that it is virtually impossible to determine the quality, performance and scalability of the existing systems. Three text mining tools, selected against specific criteria, were investigated to determine their suitability in this real world environment. Surprisingly, the study shows that none are really effective for the task, though each gave some useful output. None could be recommended for full scale implementation.

Keywords: Text mining, Classification systems, Taxonomies, Knowledge management, Content management

1. Introduction

One would have expected that when technologies go out of the controlled environment of labs into the competitive software market, they would be ready for adoption. Nonetheless, this does not seem to be the case for text mining tools that have been launched and gained prominence since the end of the 1990s. Text mining defined as the process of extracting information from unstructured text has emerged as a hybrid discipline which draws on information retrieval, statistics, computational linguistics, and more often than not, the topic area the text mining is applied to. The results of tests conducted to evaluate text mining techniques at Alergoclínica - a Brazilian dermatology and allergies clinic - were hardly encouraging.

Alavi & Leidner (2001) state that the basis for achieving competitive advantage through knowledge is more related to the ability of a company to effectively employ existing knowledge than to the knowledge itself. However, different views on the meaning and nature of knowledge and knowledge management (KM) entail the choice and adoption of different knowledge management systems and, therefore, different technologies.

In this paper we assume the perspective of knowledge as access to information. From this standpoint, KM focuses on securing access to and retrieval of information while the role of IT is to provide effective search and retrieval mechanisms (Alavi & Leidner, 2001). Among them, Content Management Systems (CMS) offer a powerful software solution that benefits users by making it easier to manage learning content and digital assets in an organisational environment (Martis 2005).

At the same time, a successful KM system should be aligned with the organisational culture. In the case of Alergoclínica, where the sharing of information is already the practice, the human aspect is very important and so is the use of the internal common jargon. Any system that intends to help with the management of knowledge in this environment should firstly be concerned with delivering access to information. It is worth mentioning that Alergoclínica already has a shared knowledge base, built upon the current mode of operation of the clinic, but requires an improved system along with the appropriate technology to facilitate the exchange.

Alavi & Leidner (2001) propose a framework of four processes in knowledge management in organisations - (a) creation, (b) storage/retrieval, (c) transfer and (d) application – and they then discuss the role of IT in each of these. Regarding the second process, which is our current focus, they state that advanced storage and retrieval technologies can be used to improve organisational memory. Furthermore, any repository that aims to help an organisation to remember should be indexed in a way that facilitates knowledge transfer between the knowledge base and the individual and promotes appropriation. It is thus reasonable to say that the vocabulary employed to retrieve content is a major concern.

This is the point at which text mining techniques become useful. By going through a vast amount of text and making use of powerful algorithms, possibilities abound. One of these is to apply methods that derive a common vocabulary from the organisation texts themselves.

Based on the above, the aim of this paper is to investigate the use of a text mining approach for automatic taxonomy generation and text categorization in the design of an intranet-based CMS.

The rest of the paper is structured as follows: section 2 provides the background and rationale for the research, in section 3 we investigate the suitability of the prevalent methodologies, evaluate the available software tools and select appropriate text mining software. Section 4 presents the text analysis, clustering and automatic categorisation processes and assesses the results. Finally, in section 5 we draw conclusions and outline future work.

2. Background

The main function of a CMS is to organize the documents of an organisation in a way that is easily accessible for users. However, according to a survey conducted by Forrester Research (Tilak, 2005) “only 44 per cent of users surveyed feel that it is easy to find what they’re looking for on the intranet” whereas 61 per cent of respondents rank improved search capabilities as the area that needs the most improvement.

Taxonomies provide a framework for categorization of content in the system (i.e. a controlled vocabulary) and are believed to improve the retrieval process by allowing users to broaden out or narrow down a search within the relevant subject category or topic instead of relying on user’s ability to build effective search queries (Cisco & Jackson, 2005). However, building and maintaining a human-generated thesaurus as the controlled vocabulary tool is time-consuming, expensive, and intellectually demanding (Shearer, 2004). Automatic text categorisation is therefore a topic of interest both in the domain of advanced information retrieval since the early 1960s, and, more recently, in text mining. It is seen as essential when the great volume of documents and scarcity of time make the manual approach impractical, in addition to improving productivity in cases where human judgement is necessary (Sebastiani, 2002).

The value of text mining is also becoming evident in another related area – ontologies – especially since their role in the Semantic Web infrastructure has proven indispensable (Doan et al. 2004). Ontologies impose machine-readable constraints on the hierarchical relationships defined by a taxonomy, which enables scientific knowledge to be interpreted and represented from an unstructured text (Stevens et al., 2000). They are particularly suited for any type of concept-oriented search (Dotsika & Watkins, 2004) and as such potentially crucial for the future of text mining.

Moreover, the acknowledged fact that at least 80% of the information in a company is in the form of text (Dörre, 1999;Tan, 1999) has served as both motivation and advertisement for text mining. The following are popular text mining techniques:

Ø Information or feature extraction identifies key phrases and relationships by looking for predefined sequences (pattern matching).

Ø Clustering groups similar documents on the fly instead of through the use of predefined topics, and documents can appear in multiple topic lists. A basic clustering algorithm creates a vector of topics for each document and measures the weights indicating how well the document fits into each cluster.

Ø Categorisation detects the main topics of a document by placing it into a pre-defined set of topics, which have been identified by counting the words that appear in the text.

Ø Topic tracking works by storing user profiles and, based on the documents the user views, predicts other documents of interest to the user.

Ø Text summarisation helps the user identify whether lengthy documents meet the user’s needs.

Ø Concept linkage connects related documents by identifying their commonly shared concepts.

Performance of the major text categorisation algorithms is currently around 80% effectiveness (Sebastiani, 2002; Yang & Liu, 1999). According to Sebastiani (2002), this is comparable to levels of effectiveness of trained human coders. In discussing issues on a bottom up, automatic approach Cisco & Jackson (2005) mention:

Ø little control of the meaning of high level concepts

Ø refinement required before it makes sense to users

Ø high cost

Ø human intervention required to add and delete categories as appropriate and/or judge if the final taxonomy corresponds with human understanding.

By and large, current research in the field focuses on the lab-based technical aspects of the technology whereas the business literature on the topic expresses high expectations along with issues and concerns. The core of these anxieties are rooted in human v machine open questions, mainly whether the software performs as well as humans and the extent to which human intervention is required. Different authors have different views on this matter and no definite answer seems to have been found. Thus, there still is a huge gap between research and practice, which show the need for studies to unite these worlds.

3. Methodology And Data Collection

Most studies in text mining techniques adopt traditional experimental methods. This type of approach usually requires experimental and control groups in addition to heavy quantitative analysis in order to suggest causative relationships between variables. However, it was not our concern to investigate the performance of any particular algorithm or setup. Therefore, a purely experimental design did not seem appropriate.

Ultimately, the adequacy of the taxonomy and the effectiveness of categorisation can be better determined by people with expertise in relation to the contents of the documents, which in the case of a corporate intranet-based system are the producers and users of information.

Moreover, it appears that the use of the commercially available technology is not yet well mapped from the academic perspective. Therefore, an in-depth qualitative inquiry seemed to be more appropriate to elicit effective practices as well as limitations of the technique. Evaluation research includes the application of scientific procedures to the collection and analysis of information about the content, structure and outcomes of programmes, projects and planned interventions (Clarke, 1999). Although it has been commonly used in the social sciences, it was particularly suitable in this case because it (a) provides the required rigour to test the technology and the replication of the particular procedures undertaken and (b) facilitates the quest for value and meaning of the outcomes.

The aim of the study was to investigate the use of a text mining approach for automatic taxonomy generation and text categorization in the design of an intranet-based Content Management System (CMS). The selected organization, Alergoclínica, was a clinic of dematology and allergies with 6 branches in Brazil that was initiating a project to implement a CMS at their intranet.

Thus, a selection of evaluation methods were employed to select the tools, collect the data and run the text mining analysis. The criteria to select the text mining tools are as follows.

Essential:

Ø availability – either as a free or demonstration version

Ø language – support for Brazilian Portuguese because the clinic operates in Brazil

Ø specific features – taxonomy generation, automatic categorisation of documents.

Desirable:

Ø support for different formats of documents including .doc, .pdf, .htm

Ø ease of use – friendly GUI, documentation etc.

Ø visual aids – e.g. a cognitive map for viewing main concepts.

In order to find out which text mining tools were available at the time of the initiation of this project in June 2005 the following independent web sites were consulted:

Ø Kdnuggets Text Analysis, Text Mining, and Information Retrieval Software [http://www.kdnuggets.com/software/text.html]

Ø Text Analysis Info Page [http://www.textanalysis.info]

Ø Text-mining.org [http://www.text-mining.org]

These provided a list of more than ten different text mining packages that would be suitable for this evaluation. However only five of them had a free or demo version available.

Since none of the tools evaluated completely satisfied the pre-defined criteria, two tools were used in the evaluation: Megaputer Text Analyst 2.3, particularly for the taxonomy creation and Provalis Research QDA Miner/Wordstat 5.0 suite, for the automatic categorisation tests.

A purposeful stratified sample of Alergoclínica’s documents was collected in order to insure that all relevant areas and themes were represented for each department and appeared to be appropriate to a team of content experts. The sample from each area was split into two sets of similar documents, the odd and even sets, in order to run reliability tests at the analysis phase.

A team of three key members of the staff, the content experts, was formed to help in the selection of documents according to the sample strategy and to provide overall assistance in the contact with the rest of the organisation. These were the Head of the Marketing Department, the Head of the Human Resources Department and the Executive Director. The number of documents gathered from each department was as follows:

Ø Marketing – 120 documents

Ø Human Resources – 42 documents

Ø Scientific – 16 documents

4. The Tools At Work

The text mining tests included three main tasks: text analysis, clustering and categorisation.

4.1. Text Analysis

TextAnalyst was used for the taxonomy generation. Before any analysis could be conducted, a pre-processing phase was required to exclude common words. The analysis was run separately in the two sets of sample documents – odd and even – and, surprisingly, presented only two terms in common. The substantial difference in results may indicate that the vocabulary is too varied within the documents or that the selection of terms is not stable enough thus impacting on the reliability of the procedure. The terms do not seem appropriate to discriminate between the documents, though they might be a sign of high occurrence. Further manipulation of the stoplist could improve and thus modify the entire results. However, in this case, more contact with the human content experts would be necessary to create criteria for such selection.

In view of that, a supplementary exercise with the Marketing content expert was conducted to help evaluate the validity of the results. 12 (10%) documents were randomly selected and sent back to their creator. First, he was asked to underline the most relevant words in each document and to provide any words outside the document that named a more appropriate topic or category, if any. Then, the topic structure was sent and he was asked to see if any of the words in the list could be used to classify the same documents. Finally a general opinion of the taxonomy was obtained. When comparing the topics selected in the text by the content expert with the ones extracted by the tool for the same documents only 2 topics coincided.

Table 1 below compares the topics selected in the text by the content expert with the ones extracted by the tool for the same documents: only 2 topics coincide – satisfação (“satisfaction”) and recepção (“reception”).

Doc.	Human topics in the text (not in the text)	Machine
1	atraso, reclamação	none
2	nao foi atendida, demorou (espera)	situações
3	agradeço, elogio, satisfação	objetivo, satisfação, padrão
4	indico, idoso, estacionamento	convênio, função, apresentação, participação
5	ligo, ocupado, esperando, sem resposta, não tenho tempo, melhor atendida (espera)	atendimento telefônico da Unidade, alterações
6	sugiro, café, cha, água (sugestão)	objetivo, satisfação, contato, participação
7	Esperar (espera)	situações, relação, contato, participação
8	atendido primeiro, esperar, esperei muito, não fui atendida, reclamam, favor (espera, reclamação)	recepção, sugestão, satisfação
9	cadeira de rodas, indignada, deficiente, queixa (reclamação)	sugestão
10	organizacao, estacionamento, horário correto	situações
11	rápido, eficiente, recepção, atendimento médico	recepção, objetivo, satisfação, padrão
12	decepcionada, esperando, atraso, recomendarei (espera)	recepção, atendimento da recepção, situações

Table 1. Human v machine topic extraction

Most of the terms were obtained from a single document, the Nursing Manual. On further investigation, it was found that the Nursing and Integration manuals were much longer than the other sources, (in fact 10 times larger than the average of the others). Although this may signify that longer documents have a larger impact on the contribution of terms for the topic structure, it does not necessarily follow that these documents should need more terms for classification. Even if it was indeed proven beneficial to use more terms in order to cater for the sub-topics in these larger documents, it would be more sensible to have all those terms under a major single category such as “Manuals”, as suggested by the content expert. However, the opaqueness and lack of flexibility of the algorithm offers no aid in dealing with this issue.

The overall taxonomy generated by the software could be used as a starting point for the final version to be generated manually by the steering group, but certainly much human intervention would be needed to provide a consistent and comprehensive terminology.

4.2. Clustering

Another way to try to understand the similarity of documents, and therefore possible categories within a domain, is to use clustering. This was performed using WordStat’s dendrograms. The output produced is in the form of a dendrogram graph which shows the file names grouped according to the degree of similarity with the words used in each document. The number of clusters must be defined by the user and the documents are automatically split into the specified number of groups. There is no control over the number of documents included in each cluster nor any possibility of moving a particular document from one cluster to another.

Clustering was first performed with the Human Resources set of 42 documents, producing the dendrogram. In this case, after several trials, the solution with 3 groups was chosen because it seemed to produce the most consistent groupings. Human intervention was required to examine the documents allocated to each group and to manually assign labels to the groups, since the software does not indicate the textual reason for the cluster.

Clustering was also performed in the Marketing Department documents, particularly in the subset of 116 letters answering customer queries. In this case, after trials, the number of clusters chosen was 10. Since these were named according to an internal Alergoclínica classification system as indicated by the content expert interviewed, it was possible to run further analysis to compare the clustering generated by the software with the human categorisation. Thus, this scheme was used to code the set of documents and cross-tab the clusters given by the tool with these human categories.

The bar chart in Figure 1 presents how each software-generated cluster is divided in terms of the type assigned by the content expert.

Figure 1. Human (type) vs. Machine Cluster

Cluster 1 is the largest group incorporating the majority of the overall complaints (56 out of 82). Groups with only one document occurred even if fewer clusters were chosen. This suggests that they are considered very different from any other documents and would not belong to any other group. Although the software offers the option to eliminate clusters with only one item, this is not really useful considering that every document must belong to a category, even if it is something as general as “others”.

Cluster 3 incorporates the majority of suggestions, but it is not very uniform in terms of type as it also contains complaints and one praise. Cluster 6 incorporates the majority of praises (13 out of 17).

Based on this comparison it was possible to verify that the groupings were fairly reasonable and could be labelled. However, some problem issues were encountered during the process such as “outlier” documents falling into one-item clusters, and documents with more than one topic not being able to belong to more than one group.

Based on the results of the cluster analysis and input from the content experts, a secondary taxonomy outline was devised which could be used as the basis for yahoo-like directories in the CMS (Figure 2 below).

Organization Chart
Figure 2. Taxonomy for Yahoo-like navigation

4.3. Automatic Categorisation

Automatic categorisation requires a previously categorized set in order to train and test the model; only then can it be used with uncategorized documents. Therefore, the choice was made to employ the user’s classification system of customer letters once more to test this technique. The software used was again WordStat and the sample used to train and test the model was the customer letters of the year 2004 (a subset of the Marketing sample). A categorical variable was created and manually assigned to either: Complaint, Praise or Suggestion, according to the content expert's file nomenclature (R- Reclamação, E- Elogio, S- Sugestão) and used as the independent variable. The predictors were the keywords in the text.

The classification model generated was very successful, according to the degrees indicated by the literature, making a concession for comparability issues. In total, 48 documents were assigned correctly and 11, missed out of the 59 documents. Thus, the overall accuracy was 81%. When applying the model to the documents not used in its creation, that is, customer letters of the year 2003, the performance seemed to achieve about the same level of results. Additional analysis of the text categorisation technique was hindered due to the lack of other pre-categorized sets and the non-feasibility of running real-time tests at this stage.

The predicted v actual precision and predicted v actual recall in each individual category are highlighted in tables 2a and 2b.

		Predicted
	Frequency Col Pct	Elogio (Praise)	Reclamação (Complaint)	Sugestão (Suggestion)	TOTAL
Actual	Elogio (Praise)	8	0	1	9
	Elogio (Praise)	100%	0	6.25%
	Reclamação (Complaint)	0	30	5	35
	Reclamação (Complaint)	0%	85.71%	31.25%
	Sugestão (Suggestion)	0	5	10	15
	Sugestão (Suggestion)	0%	14.29%	62.5%
	TOTAL	8	35	16	59

Table 2a. Predicted v Actual showing Precision - column percentage

		Predicted
	Frequency Row Pct	Elogio (Praise)	Reclamação (Complaint)	Sugestão (Suggestion)	TOTAL
Actual	Elogio (Praise)	8	0	1	9
	Elogio (Praise)	88.89%	0	11.11%
	Reclamação (Complaint)	0	30	5	35
	Reclamação (Complaint)	0%	85.71%	14.29%
	Sugestão (Suggestion)	0	5	10	15
	Sugestão (Suggestion)	0%	33.33%	66.67%
	TOTAL	8	35	16	59

Table 2b. Predicted v Actual showing Recall - row percentage

It is noteworthy that while the software performed excellently with the Elogio documents, and reasonably well with the Reclamação documents, it was much less adequate in allocating the more conceptually difficult Sugestão documents to the appropriate category.

It appears that the Automatic Categorisation technique was more successful than the Automatic Generation of taxonomy even if fewer tests were possible. It is reasonable thus to believe that it would be applicable as a stand alone technology in other scenarios in which a taxonomy already exists or some documents are already categorized and another set should be categorized in the same way. It is a “do-it-like-this” kind of approach which can only fit specific purposes. Nonetheless, if the particular fit is found, then it might be successful.

4.4. Manual Versus Automatic

Table 3 summarizes some perceptions on how the automatic approach evaluated in this study would compare with the manual one. Human input in the automatic approach is italicised for emphasis. Overall, it seems that there are not enough advantages in the automatic process to justify substitution. In fact, it might be more problematic as additional factors need to be considered. However, in some specific circumstances, the nature of the application or of the organisation may justify adoption of the automatic approach.

	Manual	Automatic
Method	Top-down and bottom-up	Bottom-up
Taxonomy generation process	All terms and their relationships are determined by people. Tools can be used to store and facilitate maintenance of the structure.	Tools are used to extract the relevant terms from the text and generate the taxonomy and its relations. Structure needs to be revised by content experts to include/ exclude terms and relationships as appropriate.
Categorization process	Each document is reviewed and manually tagged before publishing. Inconsistency might arise when many different coders work on the same documents	Documents are automatically categorized based on a sample pre-categorized set. Revision might be necessary or machine re-trained if categories change. Requires integration of the model generated to the CMS.
Time consumption	High because requires intense involvement of human resources	Also high because analysis phase is long and also require human intervention
Staff requirements	· Content analyst with librarianship skills and knowledge in the topic area · Content experts to help create a suitable model · Editors to manually tag contents	· Text mining analyst with some librarianship skills and trained in text mining activities (data mining experience might help) · Content experts to create higher level topics and to validate categories · Programmer, if components are to be integrated in the CMS
Organisation profile	Best suited for small companies or small quantity of documents. Large companies should have a culture that facilitates knowledge sharing and agreement regarding terminology	Best suited for companies in which categorization or large quantities of documents is strategic or the cost of an existing manual approach has become a serious concern.
Cost	Internal, staff-related	Cost of the tool, probably cost of third party consultants, and staff costs

Table 3. Comparison of manual vs. automatic approaches

5. Conclusions And Future Work

Our evaluation showed that the tools were not flexible enough and provided little help for the automatic generation of a taxonomy. Human intervention was required, as anticipated by the literature, but was much higher than expected. The text categorisation was more successful in terms of performance of the algorithm but many issues were found, such as that the need for a pre-coded set of documents was an obstacle to further tests. Overall the approach was found to be only conditionally viable, and consequently the technology was not recommended to Alergoclínica at this moment. Full details of the investigation may be found in Nara Pais' MSc dissertation (Pais, 2006).

Likewise, we believe that the applicability of these techniques for the automatic creation of ontologies is also limited. Apart from the demonstrated weakness of its taxonomy component, further studies would be necessary to determine whether it will ever be capable of aiding in other aspects, such as different types of relationships and axioms, which are essential for a fully fledged ontology.

This study was important to help bridge the gap between the technical experiments with algorithms and the real business application of the technology. Allowing for limitations of scope, it shows that the technology does not deliver the benefits advertised by vendors and supporters, nor does it address the issues and concerns voiced in the literature. It is undeniable that the field has advanced since its early start more than 40 years ago, yet commercial products are less than 10 years old and still need much development before they can be valuable in real practice.

Text mining is bound to gain prominence as tools evolve and text repositories grow. Therefore, further investigation needed not only in what relates to taxonomy (or, possibly, ontology) generation and text categorisation, but also in the more exploratory aspects of knowledge discovery. In the meantime, it is advisable to adjust one’s level of expectation, bearing in mind the emergent state of the tools and the limitations of their effectiveness at this stage.

6. References

Alavi, M. & Leidner, D., 2001. Knowledge management and knowledge management systems: conceptual foundations and research issues. MIS Quarterly, vol. 25, No. 1, p 107-136.

Cisco, S. & Jackson, W., 2005. Creating order out of chaos with taxonomies. Information Management Journal, May/June, vol. 39, issue 3, p 44-50.

Clarke, A., 1999. Evaluation research: an introduction to principles, methods and practice. London: SAGE.

Doan, A., Madhavan J. & Domingos, P., 2004. Learning to match ontologies on the semantic web, VLDB Journal, vol. 12, p 303-319.

Dotsika, F. & Watkins, A., 2004. Can conceptual modelling save the day: a unified approach for modelling information systems, ontologies and knowledge bases. In: Khosrow-Pour, Mehdi [Ed]., Innovation through information technology. 2004 IRMA conference, New Orleans, Hershey PA: Idea Group.

Dörre, J., 1999. Text mining: finding nuggets in mountains of textual data. Conference on knowledge discovery in data archive. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, San Diego, California. New York, ACM Press, p 398-401.

Martis, M.S. 2005. Content management system as an effective knowledge management enabler. International Journal of Applied Knowledge Management, vol.1, issue 2.

Pais, N., 2006. An investigation of the text mining approach. MSc Dissertation. London, University of Westminster.

Sebastiani, F., 2002. Machine learning in automated text categorization. ACM Computing Surveys (CSUR) archive, vol. 34, issue 1, p 1-47.

Shearer, J., 2004. A practical exercise in building a thesaurus. Cataloging & Classification Quarterly, vol. 37, issue 3/4, p 35-56.

Stevens, R., Goble, C.A. & Bechhofer, S., 2000. Ontology-based knowledge representation for bioinformatics. Briefings Bioinform, vol.1, issue 4, p 398-414.

Tilak, J., 2005, Desktop technologies most important in corporate IT – survey http://www.dmeurope.com/default.asp?ArticleID=7994, 16/3/2006

Tan, A.H., 1999. Text mining: the state of the art and the challenges. Proceedings of the PAKDD 1999 workshop on knowledge discovery from advanced databases, Beijing, China, p 65–70.

Yang, Y. & Liu, X., 1999. A re-examination of text categorization methods. Proceedings of SIGIR-99, 22nd ACM international conference on research and development in information retrieval, Berkeley, California, p 42–49.

Contact the Authors :

Nara Naomi Nishitani Pais is Solutions Architect at SPSS MR Latin America Inc. and can be reached at: R. Nova York, 871 ap 61. Brooklin Paulista, São Paulo - SP, BRAZIL; Phone: +5511 55321251; E-mail: narapais@uol.com.br

Fefie Dotsika is a Senior Lecturer in the Business School of the University of Westminster and can be reached at: University of Westminster, 35 Marylebone Road, London NW1 5LS, UK; Phone: +44 (0)20 79115000 ext. 3027; E-mail: F.E.Dotsika@westminster.ac.uk