Drivers for analysis: factor or cluster.

Factor analysis and cluster analysis both function as powerful multivariate statistic tools that assist the researchers to explore the homogeneity and relationships among variables or subjects (Gorman and Primavera, 2012). Yet they are distinctly different. Only if one understands one’s driver for analysis, as well as the strengths and limits of the two methods, can the researcher choose the most appropriate tool to carry out the analysis for an ideal outcome. In order to gain further insights of these two specific methods, the author would firstly define the concepts.
The comparison between factor analysis and cluster analysis is about approaching a set of data from two different perspectives. Factor analysis, referring to Bryman and Cramer (2006), is a statistical approach that emphasises on analysing the interrelationship among a great number of quantitative variables and interpreting them in terms of their common underlying dimensions, which are hereby named factors. Bryman and Bell (2011), stresses that factor analysis should be seen as a data reduction and summarisation technique aiming to reduce the number of variables with which the researcher needs to deal to one of more manageable size. On the other hand, cluster, defined as group of similar objects. Correspondingly, cluster analysis or clustering, according to Gorman and Primavera (1983), is a multivariate technique that focuses on grouping objects base on the proximities and similarities in their attributes. Nevertheless, due to the fact that there are clustering process using variables as basis for classification, vice versa, factor analysis procedures using objects as basis of factoring (Krebs et al., 2000). Arguments are that cluster analysis ought not to be identified exclusively with the object oriented approach to the data matrix (ibid). Moreover, Castro (2002) also emphasises on the ambiguity regarding the notion of a “cluster”. In other words, there is no precise definition for “cluster”. Various cluster models as well as clustering algorithms have been developed as a consequence of this ambiguity. Namely, hierarchical and non hierarchical cluster analysis, agglomerative and divisive algorithm, sequential and parallel threshold method, as well as optimising procedure. They vary tremendously depending on the notion of a cluster (Gorman and Primavera, 1983). As in the aim of this article is to investigate drivers for the two distinct methods, which are factor and cluster analysis, the author will not further elaborate on details of cluster models and algorithms. To serve the purpose of this article, “objects” will be referred to as “research units” in the following contexts. Being plotted geometrically, objects within a specific cluster will appear to be close to each other whilst the distance between different clusters will be further apart on the premise that the classification is done successfully (Stamatis, 2003).
As mentioned in previous contexts, one can hardly choose the most appropriate tool to carry out the analysis without understanding the drivers behind certain analysis. Aiming to investigate drivers for the two distinct methods, the author decided to demonstrate the purpose and goal underlying each of the methods. Despite the explicit differences, both techniques, or procedures share same underlying logic, which is classification that is built on homogeneity (Krebs et al., 2000). As a result of their distinctive fundamental basis, cluster analysis and factor analysis yield different information about the data. While factor analysis emphasises on grouping variables, cluster analysis, on the contrary, concentrates on classification of research units based on homogeneity of their similarity on variables. It emphasis on the homogeneity and heterogeneity within the research units (Chambliss and Shutt, 2015). Videlicet, procedure of cluster analysis is based on proximity whilst factor analysis is based on correlation.
In respect to the purposes and objectives of these two techniques. Despite of the fact that both can be utilised as useful tools for data reduction and segmentation , factor analysis aims to reduce the number of variables and identifying the underlying interrelationship between variables and sometimes unobservable or latent construct in a set of data (Rogerson, 2001). It is widely utilised for theory development as it “implies the aspiration of establishing a theoretically based causal relationship between indicators (items) and a latent variable (the factor or dimension)” (Gorman and Primavera, 1983), especially in fields as marketing, genomics, and social sciences researches (Howitt and Cramer, 2014). On the other hand, besides data simplification, cluster analysis’s overarching goals are taxonomy description and relationship identification. Furthermore, utilisation of cluster analysis can be extremely efficient when reachers wish to develop hypotheses concerning the nature of the data or to test and examine previously developed hypotheses (Chambliss and Shutt, 2015). To give an example, if a hotel company believes that their customers are segmented into two group in regards to the comfort of the room and room price per night. Cluster analysis would then be able to classify the costumers who prefers comfort over price verses price over comfort. The resulting clusters, if any, can be portrayed for demographic similarities and differences. Common criticism of cluster analysis is that cluster structure will always be implied on a set of data even if the well separated cluster is unwarranted (Gorman and Primavera, 1983; Saunders et al., 2009; Clark et al., 2010). As for factor analysis, limitation is that, researchers have no other choice but to make vital decisions about factor rotation strategy as premises, which will strongly affect the eventual outcome (Gorman and Primavera, 1983).
In summary, choices of techniques is high relevant to researchers’ intention. Depending on purposes of researches, Bacher (1996, cited Rogerson, 2001) suggests researcher to employ cluster analysis when aiming to classify entities and to exploit factor analysis when aiming to gain insights of underlying correlation of the variables (Sangren, 1999). Nonetheless, Gorman and Primavera (1983) proposed that these two techniques are not exclusive to one another. Comparison between factor analysis and cluster analysis can approach a data set from two complementary perspectives. Referring to one of ht most widely utilised software for statistical analyst, SPSS, factor analysis and cluster analysis can be used in a complementary fashion which will lead to enhancements of the interpretation of results found using each technique individually.

Reference list:

Bryman, A. (2012) Social Research Methods. 4th ed. Oxford: Oxford University Press.

Bryman, A. and Bell, E. (2011) Business Research Methods. 3rd ed. Oxford: Oxford University Press.

Bryman, A. and Cramer, D. (2006) Quantitative Data Analysis with SPSS 12 and 13 A guide for social scientists. New York: Routledge

Castro, V. E. (2002) Why so many clustering algorithms: a position paper. ACM SIGKDD Explorations Newsletter, 4(1), 65 – 75. Available from : http://dl.acm.org/ [Accessed 30 November 2015]

Chambliss, D. F. and Schutt, K. R. (2015) Making Sense of the Social World Methods of Investigation. 3rd ed. Available from : https://uk.sagepub.com [Accessed 30 November 2015]

Clark, M., Riley, M., Wilkie, E. and Wood, C. (2010) Researching and Writing Dissertations in Hospitality and Tourism. UK: Thomson

Howitt, D. and Cramer, D. (2014) Introduction to Research Methods in Psychology, 4rd ed. Available from: http://pearsoned.co.uk/

Gorman, B. S. and Primavera, L. H. (1983) The Complementary Use of Cluster and Factor Analysis Methods. The Journal of Experimental Education, 51(4), 165 – 168. Available from: http://www.jstor.org/ [Accessed 29 November 2015]

Krebs, D., Berger.M., and Ferligoj, A. (2000) Approaching Achievement Motivation –
Comparing Factor Analysis and Cluster Analysis. New Approaches in Applied Statistics, 148 – 171. Available from : http://www.stat-d.si/ [Accessed 29 November 2015]

Rogerson, R. A. (2001) Statistical Methods for Geography. Available from :https://srmo.sagepub.com [Accessed 30 November 2015]

Sangren, S. (1999) A survey of multivariate methods useful for market research. Available from: http://www.quirks.com [Accessed 29 November 2015]

Stamatis, D. H. (2002) Six Sigma and Beyond: Statistics and Probability, Volume III: 003 (Six Sigma and Beyond Series) Available from: https://books.google.ch [Accessed 03 November 2015].