• Skip to secondary menu
  • Skip to main content
  • Skip to primary sidebar

Statistics By Jim

Making statistics intuitive

Factor Analysis Guide with an Example

By Jim Frost 19 Comments

What is Factor Analysis?

Factor analysis uses the correlation structure amongst observed variables to model a smaller number of unobserved, latent variables known as factors. Researchers use this statistical method when subject-area knowledge suggests that latent factors cause observable variables to covary. Use factor analysis to identify the hidden variables.

Analysts often refer to the observed variables as indicators because they literally indicate information about the factor. Factor analysis treats these indicators as linear combinations of the factors in the analysis plus an error. The procedure assesses how much of the variance each factor explains within the indicators. The idea is that the latent factors create commonalities in some of the observed variables.

For example, socioeconomic status (SES) is a factor you can’t measure directly. However, you can assess occupation, income, and education levels. These variables all relate to socioeconomic status. People with a particular socioeconomic status tend to have similar values for the observable variables. If the factor (SES) has a strong relationship with these indicators, then it accounts for a large portion of the variance in the indicators.

The illustration below illustrates how the four hidden factors in blue drive the measurable values in the yellow indicator tags.

Factor analysis illustration.

Researchers frequently use factor analysis in psychology, sociology, marketing, and machine learning.

Let’s dig deeper into the goals of factor analysis, critical methodology choices, and an example. This guide provides practical advice for performing factor analysis.

Analysis Goals

Factor analysis simplifies a complex dataset by taking a larger number of observed variables and reducing them to a smaller set of unobserved factors. Anytime you simplify something, you’re trading off exactness with ease of understanding. Ideally, you obtain a result where the simplification helps you better understand the underlying reality of the subject area. However, this process involves several methodological and interpretative judgment calls. Indeed, while the analysis identifies factors, it’s up to the researchers to name them! Consequently, analysts debate factor analysis results more often than other statistical analyses.

While all factor analysis aims to find latent factors, researchers use it for two primary goals. They either want to explore and discover the structure within a dataset or confirm the validity of existing hypotheses and measurement instruments.

Exploratory Factor Analysis (EFA)

Researchers use exploratory factor analysis (EFA) when they do not already have a good understanding of the factors present in a dataset. In this scenario, they use factor analysis to find the factors within a dataset containing many variables. Use this approach before forming hypotheses about the patterns in your dataset. In exploratory factor analysis, researchers are likely to use statistical output and graphs to help determine the number of factors to extract.

Exploratory factor analysis is most effective when multiple variables are related to each factor. During EFA, the researchers must decide how to conduct the analysis (e.g., number of factors, extraction method, and rotation) because there are no hypotheses or assessment instruments to guide them. Use the methodology that makes sense for your research.

For example, researchers can use EFA to create a scale, a set of questions measuring one factor. Exploratory factor analysis can find the survey items that load on certain constructs.

Confirmatory Factor Analysis (CFA)

Confirmatory factor analysis (CFA) is a more rigid process than EFA. Using this method, the researchers seek to confirm existing hypotheses developed by themselves or others. This process aims to confirm previous ideas, research, and measurement and assessment instruments. Consequently, the nature of what they want to verify will impose constraints on the analysis.

Before the factor analysis, the researchers must state their methodology including extraction method, number of factors, and type of rotation. They base these decisions on the nature of what they’re confirming. Afterwards, the researchers will determine whether the model’s goodness-of-fit and pattern of factor loadings match those predicted by the theory or assessment instruments.

In this vein, confirmatory factor analysis can help assess construct validity. The underlying constructs are the latent factors, while the items in the assessment instrument are the indicators. Similarly, it can also evaluate the validity of measurement systems. Does the tool measure the construct it claims to measure?

For example, researchers might want to confirm factors underlying the items in a personality inventory. Matching the inventory and its theories will impose methodological choices on the researchers, such as the number of factors.

We’ll get to an example factor analysis in short order, but first, let’s cover some key concepts and methodology choices you’ll need to know for the example.

Learn more about Validity and Construct Validity .

In this context, factors are broader concepts or constructs that researchers can’t measure directly. These deeper factors drive other observable variables. Consequently, researchers infer the properties of unobserved factors by measuring variables that correlate with the factor. In this manner, factor analysis lets researchers identify factors they can’t evaluate directly.

Psychologists frequently use factor analysis because many of their factors are inherently unobservable because they exist inside the human brain.

For example, depression is a condition inside the mind that researchers can’t directly observe. However, they can ask questions and make observations about different behaviors and attitudes. Depression is an invisible driver that affects many outcomes we can measure. Consequently, people with depression will tend to have more similar responses to those outcomes than those who are not depressed.

For similar reasons, factor analysis in psychology often identifies and evaluates other mental characteristics, such as intelligence, perseverance, and self-esteem. The researchers can see how a set of measurements load on these factors and others.

Method of Factor Extraction

The first methodology choice for factor analysis is the mathematical approach for extracting the factors from your dataset. The most common choices are maximum likelihood (ML), principal axis factoring (PAF), and principal components analysis (PCA).

You should use either ML or PAF most of the time.

Use ML when your data follow a normal distribution. In addition to extracting factor loadings, it also can perform hypothesis tests, construct confidence intervals, and calculate goodness-of-fit statistics .

Use PAF when your data violates multivariate normality. PAF doesn’t assume that your data follow any distribution, so you could use it when they are normally distributed. However, this method can’t provide all the statistical measures as ML.

PCA is the default method for factor analysis in some statistical software packages, but it isn’t a factor extraction method. It is a data reduction technique to find components. There are technical differences, but in a nutshell, factor analysis aims to reveal latent factors while PCA is only for data reduction. While calculating the components, PCA doesn’t assess the underlying commonalities that unobserved factors cause.

PCA gained popularity because it was a faster algorithm during a time of slower, more expensive computers. If you’re using PCA for factor analysis, do some research to be sure it’s the correct method for your study. Learn more about PCA in, Principal Component Analysis Guide and Example .

There are other methods of factor extraction, but the factor analysis literature has not strongly shown that any of them are better than maximum likelihood or principal axis factoring.

Number of Factors to Extract

You need to specify the number of factors to extract from your data except when using principal component components. The method for determining that number depends on whether you’re performing exploratory or confirmatory factor analysis.

Exploratory Factor Analysis

In EFA, researchers must specify the number of factors to retain. The maximum number of factors you can extract equals the number of variables in your dataset. However, you typically want to reduce the number of factors as much as possible while maximizing the total amount of variance the factors explain.

That’s the notion of a parsimonious model in statistics. When adding factors, there are diminishing returns. At some point, you’ll find that an additional factor doesn’t substantially increase the explained variance. That’s when adding factors needlessly complicates the model. Go with the simplest model that explains most of the variance.

Fortunately, a simple statistical tool known as a scree plot helps you manage this tradeoff.

Use your statistical software to produce a scree plot. Then look for the bend in the data where the curve flattens. The number of points before the bend is often the correct number of factors to extract.

The scree plot below relates to the factor analysis example later in this post. The graph displays the Eigenvalues by the number of factors. Eigenvalues relate to the amount of explained variance.

Scree plot that helps us decide the number of factors to extract.

The scree plot shows the bend in the curve occurring at factor 6. Consequently, we need to extract five factors. Those five explain most of the variance. Additional factors do not explain much more.

Some analysts and software use Eigenvalues > 1 to retain a factor. However, simulation studies have found that this tends to extract too many factors and that the scree plot method is better. (Costello & Osborne, 2005).

Of course, as you explore your data and evaluate the results, you can use theory and subject-area knowledge to adjust the number of factors. The factors and their interpretations must fit the context of your study.

Confirmatory Factor Analysis

In CFA, researchers specify the number of factors to retain using existing theory or measurement instruments before performing the analysis. For example, if a measurement instrument purports to assess three constructs, then the factor analysis should extract three factors and see if the results match theory.

Factor Loadings

In factor analysis, the loadings describe the relationships between the factors and the observed variables. By evaluating the factor loadings, you can understand the strength of the relationship between each variable and the factor. Additionally, you can identify the observed variables corresponding to a specific factor.

Interpret loadings like correlation coefficients . Values range from -1 to +1. The sign indicates the direction of the relations (positive or negative), while the absolute value indicates the strength. Stronger relationships have factor loadings closer to -1 and +1. Weaker relationships are close to zero.

Stronger relationships in the factor analysis context indicate that the factors explain much of the variance in the observed variables.

Related post : Correlation Coefficients

Factor Rotations

In factor analysis, the initial set of loadings is only one of an infinite number of possible solutions that describe the data equally. Unfortunately, the initial answer is frequently difficult to interpret because each factor can contain middling loadings for many indicators. That makes it hard to label them. You want to say that particular variables correlate strongly with a factor while most others do not correlate at all. A sharp contrast between high and low loadings makes that easier.

Rotating the factors addresses this problem by maximizing and minimizing the entire set of factor loadings. The goal is to produce a limited number of high loadings and many low loadings for each factor.

This combination lets you identify the relatively few indicators that strongly correlate with a factor and the larger number of variables that do not correlate with it. You can more easily determine what relates to a factor and what does not. This condition is what statisticians mean by simplifying factor analysis results and making them easier to interpret.

Graphical illustration

Let me show you how factor rotations work graphically using scatterplots .

Factor analysis starts by calculating the pattern of factor loadings. However, it picks an arbitrary set of axes by which to report them. Rotating the axes while leaving the data points unaltered keeps the original model and data pattern in place while producing more interpretable results.

To make this graphable in two dimensions, we’ll use two factors represented by the X and Y axes. On the scatterplot below, the six data points represent the observed variables, and the X and Y coordinates indicate their loadings for the two factors. Ideally, the dots fall right on an axis because that shows a high loading for that factor and a zero loading for the other.

Scatterplot of the initial factor loadings.

For the initial factor analysis solution on the scatterplot, the points contain a mixture of both X and Y coordinates and aren’t close to a factor’s axis. That makes the results difficult to interpret because the variables have middling loads on all the factors. Visually, they’re not clumped near axes, making it difficult to assign the variables to one.

Rotating the axes around the scatterplot increases or decreases the X and Y values while retaining the original pattern of data points. At the blue rotation on the graph below, you maximize one factor loading while minimizing the other for all data points. The result is that the loads are high on one indicator but low on the other.

Scatterplot of rotated loadings in a factor analysis.

On the graph, all data points cluster close to one of the two factors on the blue rotated axes, making it easy to associate the observed variables with one factor.

Types of Rotations

Throughout these rotations, you work with the same data points and factor analysis model. The model fits the data for the rotated loadings equally as well as the initial loadings, but they’re easier to interpret. You’re using a different coordinate system to gain a different perspective of the same pattern of points.

There are two fundamental types of rotation in factor analysis, oblique and orthogonal.

Oblique rotations allow correlation amongst the factors, while orthogonal rotations assume they are entirely uncorrelated.

Graphically, orthogonal rotations enforce a 90° separation between axes, as shown in the example above, where the rotated axes form right angles.

Oblique rotations are not required to have axes forming right angles, as shown below for a different dataset.

Oblique rotation for a factor analysis.

Notice how the freedom for each axis to take any orientation allows them to fit the data more closely than when enforcing the 90° constraint. Consequently, oblique rotations can produce simpler structures than orthogonal rotations in some cases. However, these results can contain correlated factors.

Promax Varimax
Oblimin Equimax
Direct Quartimin Quartimax

In practice, oblique rotations produce similar results as orthogonal rotations when the factors are uncorrelated in the real world. However, if you impose an orthogonal rotation on genuinely correlated factors, it can adversely affect the results. Despite the benefits of oblique rotations, analysts tend to use orthogonal rotations more frequently, which might be a mistake in some cases.

When choosing a rotation method in factor analysis, be sure it matches your underlying assumptions and subject-area knowledge about whether the factors are correlated.

Factor Analysis Example

Imagine that we are human resources researchers who want to understand the underlying factors for job candidates. We measured 12 variables and perform factor analysis to identify the latent factors. Download the CSV dataset: FactorAnalysis

The first step is to determine the number of factors to extract. Earlier in this post, I displayed the scree plot, which indicated we should extract five factors. If necessary, we can perform the analysis with a different number of factors later.

For the factor analysis, we’ll assume normality and use Maximum Likelihood to extract the factors. I’d prefer to use an oblique rotation, but my software only has orthogonal rotations. So, we’ll use Varimax. Let’s perform the analysis!

Interpreting the Results

Statistical output for the factor analysis example.

In the bottom right of the output, we see that the five factors account for 81.8% of the variance. The %Var row along the bottom shows how much of the variance each explains. The five factors are roughly equal, explaining between 13.5% to 19% of the variance. Learn about Variance .

The Communality column displays the proportion of the variance the five factors explain for each variable. Values closer to 1 are better. The five factors explain the most variance for Resume (0.989) and the least for Appearance (0.643).

In the factor analysis output, the circled loadings show which variables have high loadings for each factor. As shown in the table below, we can assign labels encompassing the properties of the highly loading variables for each factor.

1 Relevant Background Academic record, Potential, Experience
2 Personal Characteristics Confidence, Likeability, Appearance
3 General Work Skills Organization, Communication
4 Writing Skills Letter, Resume
5 Overall Fit Company Fit, Job Fit

In summary, these five factors explain a large proportion of the variance, and we can devise reasonable labels for each. These five latent factors drive the values of the 12 variables we measured.

Hervé Abdi (2003), “Factor Rotations in Factor Analyses,” In: Lewis-Beck M., Bryman, A., Futing T. (Eds.) (2003). Encyclopedia of Social Sciences Research Methods . Thousand Oaks (CA): Sage.

Brown, Michael W., (2001) “ An Overview of Analytic Rotation in Exploratory Factor Analysis ,” Multivariate Behavioral Research , 36 (1), 111-150.

Costello, Anna B. and Osborne, Jason (2005) “ Best practices in exploratory factor analysis: four recommendations for getting the most from your analysis ,” Practical Assessment, Research, and Evaluation : Vol. 10 , Article 7.

Share this:

factor analysis case study ppt

Reader Interactions

' src=

May 26, 2024 at 8:51 am

Good day Jim, I am running in troubles in terms of the item analysis on the 5 point Likert scale that I am trying to create. The thing is, is that my CFI is around 0.9 and TLI is around 0.8 which is good but my RMSEA and SRMR has a awful result as the RMSEA is around 0.1 and SRMR is 0.2. And it is a roadblock for me, I want to ask on how I can improve my RMSEA and SRMR? so that it would reach the cut off.

I hope that his message would reach you and thank you for taking the time and reading and responding to my troubled question.

' src=

May 15, 2024 at 11:27 am

Good day, Sir Jim. I am currently trying to create a 5-Likert scale that tries to measure National Identity Conformity in three ways: (1) Origin – (e.g., Americans are born in/from America), (2) Culture (e.g., Americans are patriotic) and (3) Belief (e.g., Americans embrace being Americans).

In the process of establishing the scale’s validity, I was told to use Exploratory Factor Analysis, and I would like to ask what methods of extraction and rotation can be best used to ensure that the inter-item validity of my scale is good. I would also like to understand how I can avoid crossloading or limit crossloading factors.

' src=

May 15, 2024 at 3:13 pm

I discuss those issues in this post. I’d recommend PAF as the method of extraction because your data being Likert scale won’t be normally distribution. Read the Method of Factor Extraction section for more information.

As for cross-loading, the method of rotation can help with that. The choice depends largely on subject-area knowledge and what works best for your data, so I can’t provide a suggested method. Read the Factor Rotations section for more information about that. For instance, if you get cross-loadings with orthogonal rotations, using an oblique rotation might help.

If factor rotation doesn’t sufficiently reduce cross-loading, you might need to rework your questions so they’re more distinct, remove problematic items, or increase your sample size (can provide more stable factor solutions and clearer patterns of loadings). In this scenario where changing rotations doesn’t help, you’ll need to determine whether the underlying issue is with your questions or having to small of a sample size.

I hope that helps!

' src=

March 6, 2024 at 10:20 pm

What does negative loadings mean? How to proceed further with these loadings?

March 6, 2024 at 10:44 pm

Loadings are like correlation coefficients and range from -1 to +1. More extreme positive and negative values indicate stronger relationships. Negative loadings indicate a negative relationship between the latent factors and observed variables. Highly negative values are as good as highly positive values. I discuss this in detail in the the Factor Loadings section of this post.

' src=

March 6, 2024 at 10:10 am

Good day Jim,

The methodology seems loaded with opportunities for errors. So often we are being asked to translate a nebulous English word into some sort of mathematical descriptor. As an example, in the section labelled ‘Interpreting the Results’, what are we to make of the words ‘likeability’ or ‘self-confidence’ ? How can we possibly evaluate those things…and to three significant decimal places ?

You Jim, understand and use statistical methods correctly. Yet, too often people who apply statistics fail to examine the language of their initial questions and end up doing poor analysis. Worse, many don’t understand the software they use.

On a more cheery note, keep up the great work. The world needs a thousand more of you.

March 6, 2024 at 5:08 pm

Thanks for the thoughtful comment. I agree with your concerns.

Ideally, all of those attributes are measured using validated measurement scales. The field of psychology is pretty good about that for terms that seem kind of squishy. For instance, they usually have thorough validation processes for personality traits, etc. However, your point is well taken, you need to be able to trust your data.

All statistical analyses depend on thorough subject-area knowledge, and that’s very true for factor analysis. You must have a solid theoretical understanding of these latent factors from extensive research before considering FA. Then FA can see if there’s evidence that they actually exist. But, I do agree with you that between the rotations and having to derive names to associate with the loadings, it can be a fairly subjective process.

Thanks so much for your kind words! I appreciate them because I do strive for accuracy.

' src=

March 2, 2024 at 8:44 pm

sir, i want to know that after successfully identifying my 3 factors with above give method now i want to regress on the data how to get single value for each factor rather than these number of values

' src=

February 28, 2024 at 7:48 am

Hello, Thanks for your effort on this post, it really helped me a lot. I want your recommendation for my case if you don’t mind.

I’m working on my research and I’ve 5 independent variables and 1 dependent variable, I want to use a factor analysis method in order to know which variable contributes the most in the dependent variable.

Also, what kind of data checks and preparations shall I make before starting the analysis.

Thanks in advance for your consideration.

February 28, 2024 at 1:46 pm

Based on the information you provided, I don’t believe factor analysis is the correct analysis for you.

Factor analysis is primarily used for understanding the structure of a set of variables and for reducing data dimensions by identifying underlying latent factors. It’s particularly useful when you have a large number of observed variables and believe that they are influenced by a smaller number of unobserved factors.

Instead, it sounds like you have the IVs and DV and want to understand the relationships between them. For that, I recommend multiple regression. Learn more in my post about When to Use Regression . After you settle on a model, there are several ways to Identify the Most Important Variables in the Model .

In terms of checking assumptions, familiarize yourself with the Ordinary Least Squares Regression Assumptions . Least squares regression is the most common and is a good place to start.

Best of luck with your analysis!

' src=

December 1, 2023 at 1:01 pm

What would be the eign value in efa

' src=

November 1, 2023 at 4:42 am

Hi Jim, this is an excellent yet succinct article on the topic. A very basic question, though: the dataset contains ordinal data. Is this ok? I’m a student in a Multivariate Statistics course, and as far as I’m aware, both PCA and common factor analysis dictate metric data. Or is it assumed that since the ordinal data has been coded into a range of 0-10, then the data is considered numeric and can be applied with PCA or CFA?

Sorry for the dumb question, and thank you.

November 1, 2023 at 8:00 pm

That’s a great question.

For the example in this post, we’re dealing with data on a 10 point scale where the differences between all points are equal. Consequently, we can treat discrete data as continuous data.

Now, to your question about ordinal data. You can use ordinal data with factor analysis however you might need to use specific methods.

For ordinal data, it’s often recommended to use polychoric correlations instead of Pearson correlations. Polychoric correlations estimate the correlation between two latent continuous variables that underlie the observed ordinal variables. This provides a more accurate correlation matrix for factor analysis of ordinal data.

I’ve also heard about categorical PCA and nonlinear Factor Analysis that use a monotonical transformation of ordinal data.

I hope that helps clarify it for you!

' src=

September 2, 2023 at 4:14 pm

Once identifying how much each variability the factors contribute, what steps could we take from here to make predictions about variables ?

September 2, 2023 at 6:53 pm

Hi Brittany,

Thanks for the great question! And thanks for you kind words in your other comment! 🙂

What you can do is calculate all the factor scores for each observation. Some software will do this for you as an option. Or, you can input values into the regression equations for the factor scores that are included in the output.

Then use these scores as the independent variables in regression analysis. From there, you can use the regression model to make predictions .

Ideally, you’d evaluate the regression model before making predictions and use cross validation to be sure that the model works for observations outside the dataset you used to fit the model.

September 2, 2023 at 4:13 pm

Wow! This was really helpful and structured very well for interpretation. Thank you!

' src=

October 6, 2022 at 10:55 am

I can imagine that Prof will have further explanations on this down the line at some point in future. I’m waiting… Thanks Prof Jim for your usual intuitive manner of explaining concepts. Funsho

' src=

September 26, 2022 at 8:08 am

Thanks for a very comprehensive guide. I learnt a lot. In PCA, we usually extract the components and use it for predictive modeling. Is this the case with Factor Analysis as well? Can we use factors as predictors?

September 26, 2022 at 8:27 pm

I have not used factors as predictors, but I think it would be possible. However, PCA’s goal is to maximize data reduction. This process is particularly valuable when you have many variables, low sample size and/or collinearity between the predictors. Factor Analysis also reduces the data but that’s not its primary goal. Consequently, my sense is that PCA is better for that predictive modeling while Factor Analysis is better for when you’re trying to understand the underlying factors (which you aren’t with PCA). But, again, I haven’t tried using factors in that way nor I have compared the results to PCA. So, take that with a grain of salt!

Comments and Questions Cancel reply

SlidePlayer

  • My presentations

Auth with social network:

Download presentation

We think you have liked this presentation. If you wish to download it, please recommend it to your friends in any social system. Share buttons are a little bit lower. Thank you!

Presentation is loading. Please wait.

Lecture 12 Factor Analysis.

Published by Laurence Howard Modified over 8 years ago

Similar presentations

Presentation on theme: "Lecture 12 Factor Analysis."— Presentation transcript:

Lecture 12 Factor Analysis

Aaker, Kumar, Day Seventh Edition Instructor’s Presentation Slides

factor analysis case study ppt

Factor Analysis Continued

factor analysis case study ppt

Chapter Nineteen Factor Analysis.

factor analysis case study ppt

© LOUIS COHEN, LAWRENCE MANION & KEITH MORRISON

factor analysis case study ppt

Lecture 7: Principal component analysis (PCA)

factor analysis case study ppt

Psychology 202b Advanced Psychological Statistics, II April 7, 2011.

factor analysis case study ppt

Principal Components An Introduction Exploratory factoring Meaning & application of “principal components” Basic steps in a PC analysis PC extraction process.

factor analysis case study ppt

Common Factor Analysis “World View” of PC vs. CF Choosing between PC and CF PAF -- most common kind of CF Communality & Communality Estimation Common Factor.

factor analysis case study ppt

Factor Analysis Research Methods and Statistics. Learning Outcomes At the end of this lecture and with additional reading you will be able to Describe.

factor analysis case study ppt

Factor Analysis There are two main types of factor analysis:

factor analysis case study ppt

A quick introduction to the analysis of questionnaire data John Richardson.

factor analysis case study ppt

Factor Analysis Factor analysis is a method of dimension reduction.

factor analysis case study ppt

Principal component analysis

factor analysis case study ppt

Dr. Michael R. Hyman Factor Analysis. 2 Grouping Variables into Constructs.

factor analysis case study ppt

Multivariate Methods EPSY 5245 Michael C. Rodriguez.

factor analysis case study ppt

Factor Analysis Psy 524 Ainsworth.

factor analysis case study ppt

Social Science Research Design and Statistics, 2/e Alfred P. Rovai, Jason D. Baker, and Michael K. Ponton Factor Analysis PowerPoint Prepared by Alfred.

factor analysis case study ppt

Factor Analysis © 2007 Prentice Hall. Chapter Outline 1) Overview 2) Basic Concept 3) Factor Analysis Model 4) Statistics Associated with Factor Analysis.

factor analysis case study ppt

Factor Analysis Istijanto MM, MCom. Definition Factor analysis  Data reduction technique and summarization  Identifying the underlying factors/ dimensions.

factor analysis case study ppt

MGMT 6971 PSYCHOMETRICS © 2014, Michael Kalsher

About project

© 2024 SlidePlayer.com Inc. All rights reserved.

Lesson 12: Factor Analysis

Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) “factors.” The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior. These unobserved factors are more interesting to the social scientist than the observed quantitative measurements.

Factor analysis is generally an exploratory/descriptive method that requires many subjective judgments. It is a widely used tool and often controversial because the models, methods, and subjectivity are so flexible that debates about interpretations can occur.

The method is similar to principal components although, as the textbook points out, factor analysis is more elaborate. In one sense, factor analysis is an inversion of principal components. In factor analysis, we model the observed variables as linear functions of the “factors.” In principal components, we create new variables that are linear combinations of the observed variables.  In both PCA and FA, the dimension of the data is reduced. Recall that in PCA, the interpretation of the principal components is often not very clean. A particular variable may, on occasion, contribute significantly to more than one of the components. Ideally, we like each variable to contribute significantly to only one component. A technique called factor rotation is employed toward that goal. Examples of fields where factor analysis is involved include physiology, health, intelligence, sociology, and sometimes ecology among others.

  • Understand the terminology of factor analysis, including the interpretation of factor loadings, specific variances, and commonalities;
  • Understand how to apply both principal component and maximum likelihood methods for estimating the parameters of a factor model;
  • Understand factor rotation, and interpret rotated factor loadings.

12.1 - Notations and Terminology

Collect all of the variables X 's into a vector \(\mathbf{X}\) for each individual subject. Let \(\mathbf{X_i}\) denote observable trait i. These are the data from each subject and are collected into a vector of traits.

\(\textbf{X} = \left(\begin{array}{c}X_1\\X_2\\\vdots\\X_p\end{array}\right) = \text{vector of traits}\)

This is a random vector, with a population mean. Assume that vector of traits \(\mathbf{X}\) is sampled from a population with population mean vector:

\(\boldsymbol{\mu} = \left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right) = \text{population mean vector}\)

Here, \(\mathrm { E } \left( X _ { i } \right) = \mu _ { i }\) denotes the population mean of variable i .

Consider m unobservable common factors \(f _ { 1 } , f _ { 2 } , \dots , f _ { m }\). The \(i^{th}\) common factor is \(f _ { i } \). Generally, m is going to be substantially less than p .

The common factors are also collected into a vector,

\(\mathbf{f} = \left(\begin{array}{c}f_1\\f_2\\\vdots\\f_m\end{array}\right) = \text{vector of common factors}\)

Our factor model can be thought of as a series of multiple regressions, predicting each of the observable variables \(X_{i}\) from the values of the unobservable common factors \(f_{i}\) :

\begin{align} X_1 & =  \mu_1 + l_{11}f_1 + l_{12}f_2 + \dots + l_{1m}f_m + \epsilon_1\\ X_2 & =  \mu_2 + l_{21}f_1 + l_{22}f_2 + \dots + l_{2m}f_m + \epsilon_2 \\ &  \vdots \\ X_p & =  \mu_p + l_{p1}f_1 + l_{p2}f_2 + \dots + l_{pm}f_m + \epsilon_p \end{align}

Here, the variable means \(\mu_{1}\) through \(\mu_{p}\) can be regarded as the intercept terms for the multiple regression models.

The regression coefficients \(l_{ij}\) (the partial slopes) for all of these multiple regressions are called factor loadings. Here, \(l_{ij}\) = loading of the \(i^{th}\) variable on the \(j^{th}\) factor. These are collected into a matrix as shown here:

\(\mathbf{L} = \left(\begin{array}{cccc}l_{11}& l_{12}& \dots & l_{1m}\\l_{21} & l_{22} & \dots & l_{2m}\\ \vdots & \vdots & & \vdots \\l_{p1} & l_{p2} & \dots & l_{pm}\end{array}\right) = \text{matrix of factor loadings}\)

And finally, the errors \(\varepsilon _{i}\) are called the specific factors. Here, \(\varepsilon _{i}\) = specific factor for variable i . The specific factors are also collected into a vector:

\(\boldsymbol{\epsilon} = \left(\begin{array}{c}\epsilon_1\\\epsilon_2\\\vdots\\\epsilon_p\end{array}\right) = \text{vector of specific factors}\)

In summary, the basic model is like a regression model. Each of our response variables X is predicted as a linear function of the unobserved common factors \(f_{1}\), \(f_{2}\) through \(f_{m}\). Thus, our explanatory variables are \(f_{1}\) , \(f_{2}\) through \(f_{m}\). We have m unobserved factors that control the variation in our data.

We will generally reduce this into matrix notation as shown in this form here:

\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+\boldsymbol{\epsilon}\)

12.2 - Model Assumptions

The specific factors or random errors all have mean zero: \(E(\epsilon_i) = 0\); i = 1, 2, ... , p

The common factors, the f 's, also have mean zero: \(E(f_i) = 0\); i = 1, 2, ... , m

A consequence of these assumptions is that the mean response of the i th trait is \(\mu_i\). That is,

\(E(X_i) = \mu_i\)

The common factors have variance one: \(\text{var}(f_i) = 1\); i = 1, 2, ... , m  

Correlation

The common factors are uncorrelated with one another: \(\text{cov}(f_i, f_j) = 0\)   for i ≠ j

The specific factors are uncorrelated with one another: \(\text{cov}(\epsilon_i, \epsilon_j) = 0\)  for i ≠ j  

The specific factors are uncorrelated with the common factors: \(\text{cov}(\epsilon_i, f_j) = 0\);   i = 1, 2, ... , p; j = 1, 2, ... , m  

These assumptions are necessary to estimate the parameters uniquely. An infinite number of equally well-fitting models with different parameter values may be obtained unless these assumptions are made.

Under this model the variance for the i th observed variable is equal to the sum of the squared loadings for that variable and the specific variance:

The variance of trait i is: \(\sigma^2_i = \text{var}(X_i) = \sum_{j=1}^{m}l^2_{ij}+\psi_i\) 

This derivation is based on the previous assumptions. \(\sum_{j=1}^{m}l^2_{ij}\) is called the  Communality for variable i.  Later on, we will see how this is a measure of how well the model performs for that particular variable. The larger the commonality, the better the model performance for the i th variable.

The covariance between pairs of traits i and j is: \(\sigma_{ij}= \text{cov}(X_i, X_j) = \sum_{k=1}^{m}l_{ik}l_{jk}\) 

The covariance between trait i and factor j is: \(\text{cov}(X_i, f_j) = l_{ij}\)

In matrix notation, our model for the variance-covariance matrix is expressed as shown below:

\(\Sigma = \mathbf{LL'} + \boldsymbol{\Psi}\)

This is the matrix of factor loadings times its transpose, plus a diagonal matrix containing the specific variances.

Here \(\boldsymbol{\Psi}\) equals:

\(\boldsymbol{\Psi} = \left(\begin{array}{cccc}\psi_1 & 0 & \dots & 0 \\ 0 & \psi_2 & \dots & 0 \\ \vdots & \vdots & \ddots & \vdots\\ 0 & 0 & \dots & \psi_p \end{array}\right)\)

A parsimonious (simplified) model for the variance-covariance matrix is obtained and used for estimation.

  • The model assumes that the data is a linear function of the common factors. However, because the common factors are not observable, we cannot check for linearity.

The variance-covariance matrix is going to have p ( p +1)/2 unique elements of \(\Sigma\) approximated by:

  • mp factor loadings in the matrix \(\mathbf{L}\), and
  • p specific variances

This means that there are mp plus p parameters in the variance-covariance matrix. Ideally,  mp + p is substantially smaller than p ( p +1)/2. However, if mp is too small, the mp + p parameters may not be adequate to describe \(\Sigma\). There may always be the case that this is not the right model and you cannot reduce the data to a linear combination of factors.

\(\mathbf{T'T = TT' = I} \)

We can write our factor model in matrix notation:

\(\textbf{X} = \boldsymbol{\mu} + \textbf{Lf}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{LTT'f}+ \boldsymbol{\epsilon} = \boldsymbol{\mu} + \mathbf{L^*f^*}+\boldsymbol{\epsilon}\)

Note that This does not change the calculation because the identity matrix times any matrix is the original matrix. This results in an alternative factor model, where the relationship between the new factor loadings and the original factor loadings is:

\(\mathbf{L^*} = \textbf{LT}\)

and the relationship between the new common factors and the original common factors is:

\(\mathbf{f^*} = \textbf{T'f}\)

This gives a model that fits equally well. Moreover, because there is an infinite number of orthogonal matrices, then there is an infinite number of alternative models. This model, as it turns out, satisfies all of the assumptions discussed earlier.

\(E(\mathbf{f^*}) = E(\textbf{T'f}) = \textbf{T'}E(\textbf{f}) = \mathbf{T'0} =\mathbf{0}\),

\(\text{var}(\mathbf{f^*}) = \text{var}(\mathbf{T'f}) = \mathbf{T'}\text{var}(\mathbf{f})\mathbf{T} = \mathbf{T'IT} = \mathbf{T'T} = \mathbf{I}\)

\(\text{cov}(\mathbf{f^*, \boldsymbol{\epsilon}}) = \text{cov}(\mathbf{T'f, \boldsymbol{\epsilon}}) = \mathbf{T'}\text{cov}(\mathbf{f, \boldsymbol{\epsilon}}) = \mathbf{T'0} = \mathbf{0}\)

So f* satisfies all of the assumptions, and hence f* is an equally valid collection of common factors.  There is a certain apparent ambiguity to these models. This ambiguity is later used to justify a factor rotation to obtain a more parsimonious description of the data.

12.3 - Principal Component Method

We consider two different methods to estimate the parameters of a factor model:

Principal Component Method

  • Maximum Likelihood Estimation

A third method, the principal factor method, is also available but not considered in this class.

Let \(X_i\) be a vector of observations for the \(i^{th}\) subject:

\(\mathbf{X_i} = \left(\begin{array}{c}X_{i1}\\ X_{i2}\\ \vdots \\ X_{ip}\end{array}\right)\)

\(\mathbf{S}\) denotes our sample variance-covariance matrix and is expressed as:

\(\textbf{S} = \dfrac{1}{n-1}\sum\limits_{i=1}^{n}\mathbf{(X_i - \bar{x})(X_i - \bar{x})'}\)

We have p eigenvalues for this variance-covariance matrix as well as corresponding eigenvectors for this matrix.

 Eigenvalues of \(\mathbf{S}\):

\(\hat{\lambda}_1, \hat{\lambda}_2, \dots, \hat{\lambda}_p\)

Eigenvectors of \(\mathbf{S}\):

\(\hat{\mathbf{e}}_1, \hat{\mathbf{e}}_2, \dots, \hat{\mathbf{e}}_p\)

Recall that the variance-covariance matrix can be re-expressed in the following form as a function of the eigenvalues and the eigenvectors:

Spectral Decomposition of \(Σ\)

\(\Sigma = \sum_{i=1}^{p}\lambda_i \mathbf{e_{ie'_i}} \cong \sum_{i=1}^{m}\lambda_i \mathbf{e_{ie'_i}} = \left(\begin{array}{cccc}\sqrt{\lambda_1}\mathbf{e_1} & \sqrt{\lambda_2}\mathbf{e_2} &  \dots &  \sqrt{\lambda_m}\mathbf{e_m}\end{array}\right)  \left(\begin{array}{c}\sqrt{\lambda_1}\mathbf{e'_1}\\ \sqrt{\lambda_2}\mathbf{e'_2}\\ \vdots\\ \sqrt{\lambda_m}\mathbf{e'_m}\end{array}\right) = \mathbf{LL'}\)

The idea behind the principal component method is to approximate this expression. Instead of summing from 1 to p , we now sum from 1 to m , ignoring the last p - m terms in the sum, and obtain the third expression. We can rewrite this as shown in the fourth expression, which is used to define the matrix of factor loadings \(\mathbf{L}\), yielding the final expression in matrix notation.

This yields the following estimator for the factor loadings:

\(\hat{l}_{ij} = \hat{e}_{ji}\sqrt{\hat{\lambda}_j}\)

This forms the matrix \(\mathbf{L}\) of factor loading in the factor analysis. This is followed by the transpose of \(\mathbf{L}\).  To estimate the specific variances, recall that our factor model for the variance-covariance matrix is

\(\boldsymbol{\Sigma} = \mathbf{LL'} + \boldsymbol{\Psi}\)

in matrix notation. \(\Psi\) is now going to be equal to the variance-covariance matrix minus \(\mathbf{LL'}\).

\( \boldsymbol{\Psi} = \boldsymbol{\Sigma} - \mathbf{LL'}\)

This in turn suggests that the specific variances, the diagonal elements of \(\Psi\), are estimated with this expression:

\(\hat{\Psi}_i = s^2_i - \sum\limits_{j=1}^{m}\lambda_j \hat{e}^2_{ji}\) 

We take the sample variance for the i th variable and subtract the sum of the squared factor loadings (i.e., the commonality).

12.4 - Example: Places Rated Data - Principal Component Method

Example 12-1: places rated.

Let's revisit the Places Rated Example from Lesson 11 .  Recall that the Places Rated Almanac (Boyer and Savageau) rates 329 communities according to nine criteria:

  • Climate and Terrain
  • Health Care & Environment
  • Transportation

Except for housing and crime, the higher the score the better.For housing and crime, the lower the score the better.

Our objective here is to describe the relationships among the variables.

Before carrying out a factor analysis we need to determine m . How many common factors should be included in the model? This requires a determination of how many parameters will be involved.

For p = 9, the variance-covariance matrix \(\Sigma\) contains

\(\dfrac{p(p+1)}{2} = \dfrac{9 \times 10}{2} = 45\)

unique elements or entries. For a factor analysis with m factors, the number of parameters in the factor model is equal to

\(p(m+1) = 9(m+1)\)

Taking m = 4, we have 45 parameters in the factor model, this is equal to the number of original parameters, This would result in no dimension reduction. So in this case, we will select m = 3, yielding 36 parameters in the factor model and thus a dimension reduction in our analysis.

It is also common to look at the results of the principal components analysis. The output from Lesson 11.6 is below. The first three components explain 62% of the variation. We consider this to be sufficient for the current example and will base future analyses on three components.

Component Eigenvalue Proportion Cumulative
1 3.2978 0.3664 0.3664
2 1.2136 0.1348 0.5013
3 1.1055 0.1228 0.6241
4 0.9073 0.1008 0.7249
5 0.8606 0.0956 0.8205
6 0.5622 0.0625 0.8830
7 0.4838 0.0538 0.9368
8 0.3181 0.0353 0.9721
9 0.2511 0.0279 1.0000

We need to select m so that a sufficient amount of variation in the data is explained. What is sufficient is, of course, subjective and depends on the example at hand.

Alternatively, often in social sciences, the underlying theory within the field of study indicates how many factors to expect. In psychology, for example, a circumplex model suggests that mood has two factors: positive affect and arousal. So a two-factor model may be considered for questionnaire data regarding the subjects' moods. In many respects, this is a better approach because then you are letting the science drive the statistics rather than the statistics drive the science! If you can, use your or a field expert's scientific understanding to determine how many factors should be included in your model.

  •   Example

The factor analysis is carried out using the program as shown below:

Download the SAS Program here: places2.sas

Note : In the upper right-hand corner of the code block you will have the option of copying (   ) the code to your clipboard or downloading (   ) the file to your computer.

Performing factor analysis (principal components extraction)

To perform factor analysis and obtain the communalities:.

  • Open the ‘ places_tf.csv ’ data set in a new worksheet.
  • Calc > Calculator
  • Highlight and select ‘climate’ to move it to the Store result window.
  • In the Expression window, enter LOGTEN( 'climate') to apply the (base 10) log transformation to the climate variable.
  • Choose OK . The transformed values replace the originals in the worksheet under ‘climate’.
  • Repeat sub-steps 1) through 4) above for all variables housing through econ.
  • Highlight and select climate through econ to move all 9 variables to the Variables window.
  • Choose 3 for the number of factors to extract.
  • Choose Principal Components for the Method of Extraction.
  • Under Options, select Correlation as Matrix to Factor .
  • Under Graphs, select Scree Plot.
  • Choose OK and OK again . The numeric results are shown in the results area, along with the screen plot graph. The last column has the communality values.

Initially, we will look at the factor loadings. The factor loadings are obtained by using this expression

\(\hat{e}_{i}\sqrt{ \hat{\lambda}_{i}}\)

These are summarized in the table below. The factor loadings are only recorded for the first three factors because we set m =3. We should also note that the factor loadings are the correlations between the factors and the variables. For example, the correlation between the Arts and the first factor is about 0.86. Similarly, the correlation between climate and that factor is only about 0.28.

  Factor
Variable 1 2 3
Climate 0.286 0.076
Housing 0.153 0.084
Health -0.410 -0.020
Crime 0.135
Transportation -0.156 -0.148
Education -0.253
Arts -0.115 0.011
Recreation 0.322 0.044
Economics 0.298

Interpreting factor loadings is similar to interpreting the coefficients for principal component analysis. We want to determine some inclusion criteria, which in many instances, may be somewhat arbitrary. In the above table, the values that we consider large are in boldface, using about .5 as the cutoff. The following statements are based on this criterion:

Factor 1 is correlated most strongly with Arts (0.861) and also correlated with Health, Housing, Recreation, and to a lesser extent Crime and Education. You can say that the first factor is primarily a measure of these variables.

Similarly, Factor 2 is correlated most strongly with Crime, Education, and Economics. You can say that the second factor is primarily a measure of these variables.

Likewise, Factor 3 is correlated most strongly with Climate and Economics. You can say that the third factor is primarily a measure of these variables.

The interpretation above is very similar to that obtained in the standardized principal component analysis.

12.5 - Communalities

Example 12-1: continued....

The communalities for the \(i^{th}\) variable are computed by taking the sum of the squared loadings for that variable. This is expressed below:

\(\hat{h}^2_i = \sum\limits_{j=1}^{m}\hat{l}^2_{ij}\)

To understand the computation of communulaties, recall the table of factor loadings:

  Factor
Variable(HEADING) 1 2 3
Climate
Housing 0.698 0.153 0.084
Health 0.744 -0.410 -0.020
Crime 0.471 0.522 0.135
Transportation 0.681 -0.156 -0.148
Education 0.498 -0.498 -0.253
Arts 0.861 -0.115 0.011
Recreation 0.642 0.322 0.044
Economics 0.298 0.595 -0.533

Let's compute the communality for Climate, the first variable. We square the factor loadings for climate (given in bold-face in the table above), then add the results:

\(\hat{h}^2_1 = 0.28682^2 + 0.07560^2 + 0.84085^2 = 0.7950\)

The communalities of the 9 variables can be obtained from page 4 of the SAS output as shown below:

Final Communality Estimates: Total =
Climate housing health crime trans educate arts recreate econ
0.79500707 0.51783185 0.72230182 0.51244913 0.50977159 0.56073895 0.75382091 0.51725940 0.72770402

5.616885, (located just above the individual communalities), is the "Total Communality".

Performing factor analysis (MLE extraction)

To perform factor analysis and obtain the communities:.

In summary, the communalities are placed into a table:

Variable Communality
Climate 0.795
Housing 0.518
Health 0.722
Crime 0.512
Transportation 0.510
Education 0.561
Arts 0.754
Recreation 0.517
Economics 0.728

You can think of these values as multiple \(R^{2}\) values for regression models predicting the variables of interest from the 3 factors. The communality for a given variable can be interpreted as the proportion of variation in that variable explained by the three factors. In other words, if we perform multiple regression of climate against the three common factors, we obtain an \(R^{2} = 0.795\), indicating that about 79% of the variation in climate is explained by the factor model. The results suggest that the factor analysis does the best job of explaining variations in climate, the arts, economics, and health.

One assessment of how well this model performs can be obtained from the communalities.  We want to see values that are close to one. This indicates that the model explains most of the variation for those variables. In this case, the model does better for some variables than it does for others. The model explains Climate the best and is not bad for other variables such as Economics, Health, and the Arts. However, for other variables such as Crime, Recreation, Transportation, and Housing the model does not do a good job, explaining only about half of the variation.

The sum of all communality values is the total communality value:

\(\sum\limits_{i=1}^{p}\hat{h}^2_i = \sum\limits_{i=1}^{m}\hat{\lambda}_i\)

Here, the total communality is 5.617. The proportion of the total variation explained by the three factors is

\(\dfrac{5.617}{9} = 0.624\)

This is the percentage of variation explained in our model. This could be considered an overall assessment of the performance of the model. However, this percentage is the same as the proportion of variation explained by the first three eigenvalues, obtained earlier. The individual communalities tell how well the model is working for the individual variables, and the total communality gives an overall assessment of performance. These are two different assessments.

Because the data are standardized, the variance for the standardized data is equal to one. The specific variances are computed by subtracting the communality from the variance as expressed below:

\(\hat{\Psi}_i = 1-\hat{h}^2_i\)

Recall that the data were standardized before analysis, so the variances of the standardized variables are all equal to one. For example, the specific variance for Climate is computed as follows:

\(\hat{\Psi}_1 = 1-0.795 = 0.205\)

The specific variances are found in the SAS output as the diagonal elements in the table on page 5 as seen below:

Residual Correlation with Uniqueness on the Diagonal

  Climate Housing Health crime Trans Educate Arts Recreate Econ
Climate 0.20499 -0.00924 -0.01476 -0.06027 -0.03720 0.18537 -0.07518 -0.12475 0.21735
Housing -0.00924 0.48217 -0.02317 -0.28063 -0.12119 -0.04803 -0.07518 -0.04032 0.04249
Health -0.01476 -0.02317 0.27770 0.05007 -0.15480 -0.11537 -0.00929 -0.09108 0.06527
Crime -0.06027 -0.28063 0.05007 0.48755 0.05497 0.11562 0.00009 -0.18377 -0.10288
Trans -0.03720 -0.12119 -0.15480 0.05497 0.49023 -0.14318 -0.05439 0.01041 -0.12641
Educate 0.18537 -0.04803 -0.11537 0.11562 -0.14318 0.43926 -0.13515 -0.05531 0.14197
Arts -0.07518 -0.07552 -0.00929 0.00009 -0.05439 -0.13515 0.24618 -0.01926 -0.04687
Recreate -0.12475 -0.04032 -0.09108 -0.18377 0.01041 -0.05531 -0.01926 0.48274 -0.18326
Econ 0.21735 0.04249 0.06527 -0.10288 -0.12641 0.14197 -0.04687 -0.18326 0.27230

For example, the specific variance for housing is 0.482.

This model provides an approximation to the correlation matrix.  We can assess the model's appropriateness with the residuals obtained from the following calculation:

\(s_{ij}- \sum\limits_{k=1}^{m}l_{ik}l_{jk}; i \ne j = 1, 2, \dots, p\)

This is basically the difference between R and LL' , or the correlation between variables i and j minus the expected value under the model. Generally, these residuals should be as close to zero as possible. For example, the residual between Housing and Climate is -0.00924 which is pretty close to zero. However, there are some that are not very good. The residual between Climate and Economy is 0.217.  These values give an indication of how well the factor model fits the data.

One disadvantage of the principal component method is that it does not provide a test for lack of fit. We can examine these numbers and determine if we think they are small or close to zero, but we really do not have a test for this.  Such a test is available for the maximum likelihood method.

12.6 - Final Notes about the Principal Component Method

Unlike the competing methods, the estimated factor loadings under the principal component method do not change as the number of factors is increased. This is not true of the remaining methods (e.g., maximum likelihood). However, the communalities and the specific variances will depend on the number of factors in the model. In general, as you increase the number of factors, the communalities increase toward one and the specific variances will decrease toward zero.

The diagonal elements of the variance-covariance matrix \(\mathbf{S}\) (or \(\mathbf{R}\)) are equal to the diagonal elements of the model:

\(\mathbf{\hat{L}\hat{L}' + \mathbf{\hat{\Psi}}}\)

The off-diagonal elements are not exactly reproduced. This is in part due to variability in the data - just random chance. Therefore, we want to select the number of factors to make the off-diagonal elements of the residual matrix small:

\(\mathbf{S - (\hat{L}\hat{L}' + \hat{\Psi})}\)

Here, we have a trade-off between two conflicting desires. For a parsimonious model, we wish to select the number of factors m to be as small as possible, but for such a model, the residuals could be large. Conversely, by selecting m to be large, we may reduce the sizes of the residuals but at the cost of producing a more complex and less interpretable model (there are more factors to interpret).

Another result to note is that the sum of the squared elements of the residual matrix is equal to the sum of the squared values of the eigenvalues left out of the matrix:

\(\sum\limits_{j=m+1}^{p}\hat{\lambda}^2_j\)

General Methods used in determining the number of Factors

Below are three common techniques used to determine the number of factors to extract:

  • Cumulative proportion of at least 0.80 (or 80% explained variance)
  • Eigenvalues of at least one
  • Scree plot is based on the "elbow" of the plot; that is, where the plot turns and begins to flatten out

12.7 - Maximum Likelihood Estimation Method

Maximum Likelihood Estimation requires that the data are sampled from a multivariate normal distribution. This is a drawback of this method. Data is often collected on a Likert scale, especially in the social sciences. Because a Likert scale is discrete and bounded, these data cannot be normally distributed.

Using the Maximum Likelihood Estimation Method, we must assume that the data are independently sampled from a multivariate normal distribution with mean vector \(\mu\) and variance-covariance matrix of the form:

\(\boldsymbol{\Sigma} = \mathbf{LL' +\boldsymbol{\Psi}}\)

where \(\mathbf{L}\) is the matrix of factor loadings and \(\Psi\) is the diagonal matrix of specific variances.

We define additional notation: As usual, the data vectors for n subjects are represented as shown:

\(\mathbf{X_1},\mathbf{X_2}, \dots, \mathbf{X_n}\)

Maximum likelihood estimation involves estimating the mean, the matrix of factor loadings, and the specific variance.

The maximum likelihood estimator for the mean vector \(\mu\), the factor loadings \(\mathbf{L}\), and the specific variances \(\Psi\) are obtained by finding \(\hat{\mathbf{\mu}}\), \(\hat{\mathbf{L}}\), and \(\hat{\mathbf{\Psi}}\) that maximize the log-likelihood given by the following expression:

\(l(\mathbf{\mu, L, \Psi}) = - \dfrac{np}{2}\log{2\pi}- \dfrac{n}{2}\log{|\mathbf{LL' + \Psi}|} - \dfrac{1}{2}\sum_{i=1}^{n}\mathbf{(X_i-\mu)'(LL'+\Psi)^{-1}(X_i-\mu)}\)

The log of the joint probability distribution of the data is maximized. We want to find the values of the parameters, (\(\mu\), \(\mathbf{L}\), and \(\Psi\)), that are most compatible with what we see in the data. As was noted earlier the solutions for these factor models are not unique. Equivalent models can be obtained by rotation. If \(\mathbf{L'\Psi^{-1}L}\) is a diagonal matrix, then we may obtain a unique solution.

Computationally this process is complex. In general, there is no closed-form solution to this maximization problem so iterative methods are applied. Implementation of iterative methods can run into problems as we will see later.

12.8 - Example: Places Rated Data

Example 12-2: places rated.

This method of factor analysis is being carried out using the program shown below:

Download the SAS Program here: places3.sas

Here we have specified the Maximum Likelihood Method by setting method=ml. Again, we need to specify the number of factors.

You will notice that this program produces errors and does not complete the factor analysis. We will start out without the Heywood or priors options discussed below to see the error that occurs and how to remedy it.

For m = 3 factors, maximum likelihood estimation fails to converge.  An examination of the records of each iteration reveals that the commonality of the first variable (climate) exceeds one during the first iteration.  Because the communality must lie between 0 and 1, this is the cause for failure.

SAS provides a number of different fixes for this kind of error.  Most fixes adjust the initial guess, or starting value, for the commonalities.

  • priors=smc: Sets the prior commonality of each variable proportional to the R 2 of that variable with all other variables as an initial guess.
  • priors=asmc: As above with an adjustment so that the sum of the commonalities is equal to the maximum of the absolute correlations.
  • priors=max: Sets the prior commonality of each variable to the maximum absolute correlation within any other variable.
  • priors=random: Sets the prior commonality of each variable to a random number between 0 and 1.

This option is added within the proc factor line of code (proc factor method=ml nfactors=3 priors=smc;).  If we begin with better-starting values, then we might have better luck at convergence. Unfortunately, in trying each of these options, (including running the random option multiple times), we find that these options are ineffective for our Places Rated Data. The second option needs to be considered.

  • Attempt adding the Heywood option to the procedure (proc factor method=ml nfactors=3 heywood;). This sets communalities greater than one back to one, allowing iterations to proceed. In other words, if the commonality value falls out of bounds, then it will be replaced by a value of one. This will always yield a solution, but frequently the solution will not adequately fit the data.

We start with the same values for the commonalities and then at each iteration, we obtain new values for the commonalities. The criterion is a value that we are trying to minimize in order to obtain our estimates. We can see that the convergence criterion decreases with each iteration of the algorithm.

Iteration Criterion Ridge Change Communalities
1 0.3291161 0.0000 0.2734 0.47254 0.40913 0.73500 0.22107
        0.38516 0.26178 0.75125 0.46384
        0.15271      
2 0.2946707 0.0000 0.5275 1.00000 0.37872 0.75101 0.20469
        0.36111 0.26155 0.75298 0.48979
        0.11995      
3 0.2877116 0.0000 0.0577 1.00000 0.41243 0.80868 0.22168
        0.38551 0.26263 0.74546 0.53277
        0.11601      
4 0.2876330 0.0000 0.0055 1.00000 0.41336 0.81414 0.21647
        0.38365 0.26471 0.74493 0.53724
        0.11496      
5 0.2876314 0.0000 0.0007 1.00000 0.41392 0.81466 0.21595
        0.38346 0.26475 0.74458 0.53794
        0.11442      

You can see in the second iteration that rather than report a commonality greater than one, SAS replaces it with the value one and then proceeds as usual through the iterations.

After five iterations the algorithm converges, as indicated by the statement on the second page of the output.  The algorithm converged to a setting where the commonality for Climate is equal to one.

To perform factor analysis using maximum likelihood

  • Choose Maximum Likelihood for the Method of Extraction.
  • Under Results, select All and MLE iterations , and choose OK .
  • Choose OK again . The numeric results are shown in the results area.

12.9 - Goodness-of-Fit

Before we proceed, we would like to determine if the model adequately fits the data. The goodness-of-fit test in this case compares the variance-covariance matrix under a parsimonious model to the variance-covariance matrix without any restriction, i.e. under the assumption that the variances and covariances can take any values. The variance-covariance matrix under the assumed model can be expressed as:

\(\mathbf{\Sigma = LL' + \Psi}\)

\(\mathbf{L}\) is the matrix of factor loadings, and the diagonal elements of \(Ψ\) are equal to the specific variances. This is a very specific structure for the variance-covariance matrix. A more general structure would allow those elements to take any value. To assess goodness-of-fit, we use the Bartlett-Corrected Likelihood Ratio Test Statistic:

\(X^2 = \left(n-1-\frac{2p+4m-5}{6}\right)\log \frac{|\mathbf{\hat{L}\hat{L}'}+\mathbf{\hat{\Psi}}|}{|\hat{\mathbf{\Sigma}}|}\)

The test is a likelihood ratio test, where two likelihoods are compared, one under the parsimonious model and the other without any restrictions. The constant in the statistic is called the Bartlett correction. The log is the natural log. In the numerator, we have the determinant of the fitted factor model for the variance-covariance matrix, and below, we have a sample estimate of the variance-covariance matrix assuming no structure where:

\(\hat{\boldsymbol{\Sigma}} = \frac{n-1}{n}\mathbf{S}\)

and \(\mathbf{S}\) is the sample variance-covariance matrix. This is just another estimate of the variance-covariance matrix which includes a small bias. If the factor model fits well then these two determinants should be about the same and you will get a small value for \(X_{2}\). However, if the model does not fit well, then the determinants will be different and \(X_{2}\) will be large.

Under the null hypothesis that the factor model adequately describes the relationships among the variables,

\(\mathbf{X}^2 \sim \chi^2_{\frac{(p-m)^2-p-m}{2}} \)

Under the null hypothesis, that the factor model adequately describes the data, this test statistic has a chi-square distribution with an unusual set of degrees of freedom as shown above. The degrees of freedom are the difference in the number of unique parameters in the two models. We reject the null hypothesis that the factor model adequately describes the data if \(X_{2}\) exceeds the critical value from the chi-square table.

Back to the Output...

Looking just past the iteration results, we have....

Significance Tests based on 329 Observations

Test DF Chi-Square Pr > ChiSq
\(H_{o}\colon\) No common factors 36 839.4268 < 0.0001
\(H_{A}\colon\) At least one common factor      
\(H_{o}\colon\) 3 Factors are sufficient 12 92.6652 < 0.0001
\(H_{A}\colon\) More Factors are needed      

For our Places Rated dataset, we find a significant lack of fit. \(X _ { 2 } = 92.67 ; d . f = 12 ; p < 0.0001\). We conclude that the relationships among the variables are not adequately described by the factor model. This suggests that we do not have the correct model.

The only remedy that we can apply in this case is to increase the number m of factors until an adequate fit is achieved. Note, however, that m must satisfy

\(p(m+1) \le \frac{p(p+1)}{2}\)

In the present example, this means that m ≤ 4.

Let's return to the SAS program and change the "nfactors" value from 3 to 4:

Significance Tests based on 329 Observations

Test DF Chi-Square Pr > ChiSq
\(H_{o}\colon\) No common factors 36 839.4268 < 0.0001
\(H_{A}\colon\) At least one common factor      
\(H_{o}\colon\) 4 Factors are sufficient 6 41.6867 < 0.0001
\(H_{A}\colon\) More Factors are needed      

We find that the factor model with m = 4 does not fit the data adequately either, \(X _ { 2 } = 41.69 ; d . f . = 6 ; p < 0.0001\). We cannot properly fit a factor model to describe this particular data and conclude that a factor model does not work with this particular dataset. There is something else going on here, perhaps some non-linearity. Whatever the case, it does not look like this yields a good-fitting factor model. The next step could be to drop variables from the data set to obtain a better-fitting model.

12.10 - Factor Rotations

From our experience with the Places Rated data, it does not look like the factor model works well. There is no guarantee that any model will fit the data well.

The first motivation of factor analysis was to try to discern some underlying factors describing the data. The Maximum Likelihood Method failed to find such a model to describe the Places Rated data. The second motivation is still valid, which is to try to obtain a better interpretation of the data. In order to do this, let's take a look at the factor loadings obtained before from the principal component method.

  Factor
Variable 1 2 3
Climate 0.286 0.076
Housing 0.153 0.084
Health -0.410 -0.020
Crime 0.135
Transportation -0.156 -0.148
Education -0.253
Arts -0.115 0.011
Recreation 0.322 0.044
Economics 0.298

The problem with this analysis is that some of the variables are highlighted in more than one column. For instance, Education appears significant to Factor 1 AND Factor 2. The same is true for Economics in both Factors 2 AND 3. This does not provide a very clean, simple interpretation of the data. Ideally, each variable would appear as a significant contributor in one column.

In fact, the above table may indicate contradictory results. Looking at some of the observations, it is conceivable that we will find an observation that takes a high value on both Factors 1 and 2. If this occurs, a high value for Factor 1 suggests that the community has quality education, whereas a high value for Factor 2 suggests the opposite, that the community has poor education.

Factor rotation is motivated by the fact that factor models are not unique. Recall that the factor model for the data vector, \(\mathbf{X = \boldsymbol{\mu} + LF + \boldsymbol{\epsilon}}\), is a function of the mean \(\boldsymbol{\mu}\), plus a matrix of factor loadings times a vector of common factors, plus a vector of specific factors.

Moreover, we should note that this is equivalent to a rotated factor model, \(\mathbf{X = \boldsymbol{\mu} + L^*F^* + \boldsymbol{\epsilon}}\), where we have set \(\mathbf{L^* = LT}\) and \(\mathbf{f^* = T'f}\) for some orthogonal matrix \(\mathbf{T}\) where \(\mathbf{T'T = TT' = I}\). Note that there are an infinite number of possible orthogonal matrices, each corresponding to a particular factor rotation.

We plan to find an appropriate rotation, defined through an orthogonal matrix \(\mathbf{T}\) , that yields the most easily interpretable factors.

To understand this, consider a scatter plot of factor loadings. The orthogonal matrix \(\mathbf{T}\) rotates the axes of this plot. We wish to find a rotation such that each of the p variables has a high loading on only one factor.

We will return to the program below to obtain a plot.  In looking at the program, there are a number of options (marked in blue under proc factor) that we did not yet explain.

Download the SAS program here: places2.sas

One of the options above is labeled 'preplot'. We will use this to plot the values for factor 1 against factor 2.

In the output these values are plotted, the loadings for factor 1 on the y-axis, and the loadings for factor 2 on the x-axis.

Similarly, the second variable, labeled with the letter B, has a factor 1 loading of about 0.7 and a factor 2 loading of about 0.15.  Each letter on the plot corresponds to a single variable. SAS provides plots of the other combinations of factors, factor 1 against factor 3 as well as factor 2 against factor 3.

Three factors appear in this model so we might consider a three-dimensional plot of all three factors together.

Obtaining a scree plot and loading plot

To perform factor analysis with scree and loading plots:.

  • Transform variables. This step is optional but used in the steps below.  
  • Choose OK. The transformed values replace the originals in the worksheet under ‘climate’.
  • Stat > Multivariate > Factor Analysis
  • Under Graphs, select Scree plot and Loading plot for first two factors.
  • Choose OK and OK again . The numeric results are shown in the results area, along with both the scree plot and the loading plot.

The selection of the orthogonal matrixes \(\mathbf{T}\) corresponds to our rotation of these axes. Think about rotating the axis of the center. Each rotation will correspond to an orthogonal matrix \(\mathbf{T}\). We want to rotate the axes to obtain a cleaner interpretation of the data. We would really like to define new coordinate systems so that when we rotate everything, the points fall close to the vertices (endpoints) of the new axes.

If we were only looking at two factors, then we would like to find each of the plotted points at the four tips (corresponding to all four directions) of the rotated axes. This is what rotation is about, taking the factor pattern plot and rotating the axes in such a way that the points fall close to the axes.

12.11 - Varimax Rotation

This is the sample variances of the standardized loadings for each factor summed over the m factors.

Returning to the options of the factoring procedure (marked in blue):

"rotate," asks for factor rotation and we specified the Varimax rotation of our factor loadings.

"plot," asks for the same kind of plot that we just looked at for the rotated factors. The result of our rotation is a new factor pattern given below (page 11 of SAS output):

Here is a copy of page 10 from the SAS output:

At the top of page 10 of the output, above, we have our orthogonal matrix T .

Using Varimax Rotation

To perform factor analysis with varimax rotation:.

  • Choose Varimax for the Type of Rotation.
  • Under Graphs, select Loading plot for the first two factors.
  • Choose OK and OK again . The numeric results are shown in the results area, along with the loading plot.

The values of the rotated factor loadings are:

  Factor
Variable 1 2 3
Climate 0.021 0.239
Housing 0.438 0.166
Health 0.127 0.137
Crime 0.031 0.139
Transportation 0.289 -0.028
Education -0.094 -0.117
Arts 0.432 0.150
Recreation 0.301 0.099
Economics -0.022 -0.551

Let us now interpret the data based on the rotation. We highlighted the values that are large in magnitude and make the following interpretation.

  • Factor 1: primarily a measure of Health, but also increases with increasing scores for Transportation, Education, and the Arts.
  • Factor 2: primarily a measure of Crime, Recreation, the Economy, and Housing.
  • Factor 3: primarily a measure of Climate alone.

This is just the pattern that exists in the data and no causal inferences should be made from this interpretation. It does not tell us why this pattern exists. It could very well be that there are other essential factors that are not seen at work here.

Let us look at the amount of variation explained by our factors under the rotated model and compare it to the original model. Consider the variance explained by each factor under the original analysis and the rotated factors:

  Analysis
Factor Original Rotated
1 3.2978 2.4798
2 1.2136 1.9835
3 1.1055 1.1536
Total 5.6169 5.6169

The total amount of variation explained by the 3 factors remains the same. Rotations, among a fixed number of factors, do not change how much of the variation is explained by the model. The fit is equally good regardless of what rotation is used.

However, notice what happened to the first factor. We see a fairly large decrease in the amount of variation explained by the first factor. We obtained a cleaner interpretation of the data but it costs us something somewhere. The cost is that the variation explained by the first factor is distributed among the latter two factors, in this case mostly to the second factor.

The total amount of variation explained by the rotated factor model is the same, but the contributions are not the same from the individual factors. We gain a cleaner interpretation, but the first factor does not explain as much of the variation. However, this would not be considered a particularly large cost if we are still interested in these three factors.

Rotation cleans up the interpretation. Ideally, we should find that the numbers in each column are either far away from zero or close to zero. Numbers close to +1 or -1 or 0 in each column give the ideal or cleanest interpretation. If a rotation can achieve this goal, then that is wonderful. However, observed data are seldom this cooperative!

Nevertheless, recall that the objective is data interpretation. The success of the analysis can be judged by how well it helps you to make sense of your data If the result gives you some insight as to the pattern of variability in the data, even without being perfect, then the analysis was successful.

12.12 - Estimation of Factor Scores

Factor scores are similar to the principal components in the previous lesson. Just as we plotted principal components against each other, a similar scatter plot of factor scores is also helpful. We also might use factor scores as explanatory variables in future analyses. It may even be of interest to use the factor score as the dependent variable in a future analysis.

The methods for estimating factor scores depend on the method used to carry out the principal components analysis. The vectors of common factors f are of interest. There are m unobserved factors in our model and we would like to estimate those factors. Therefore, given the factor model:

\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}; i = 1,2,\dots, n,\)

we may wish to estimate the vectors of factor scores

\(\mathbf{f_1, f_2, \dots, f_n}\)

for each observation.

There are a number of different methods for estimating factor scores from the data. These include:

Ordinary Least Squares

  • Weighted Least Squares
  • Regression method

By default, this is the method that SAS uses if you use the principal component method. The difference between the \(j^{th}\) variable on the \(i^{th}\) subject and its value under the factor model is computed. The \(\mathbf{L}\) 's are factor loadings and the f 's are the unobserved common factors. The vector of common factors for subject i , or \( \hat{\mathbf{f}}_i \), is found by minimizing the sum of the squared residuals:

\[\sum_{j=1}^{p}\epsilon^2_{ij} = \sum_{j=1}^{p}(y_{ij}-\mu_j-l_{j1}f_1 - l_{j2}f_2 - \dots - l_{jm}f_m)^2 = (\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})'(\mathbf{Y_i - \boldsymbol{\mu} - Lf_i})\]

This is like a least squares regression, except in this case we already have estimates of the parameters (the factor loadings), but wish to estimate the explanatory common factors. In matrix notation the solution is expressed as:

\(\mathbf{\hat{f}_i = (L'L)^{-1}L'(Y_i-\boldsymbol{\mu})}\)

In practice, we substitute our estimated factor loadings into this expression as well as the sample mean for the data:

\(\mathbf{\hat{f}_i = \left(\hat{L}'\hat{L}\right)^{-1}\hat{L}'(Y_i-\bar{y})}\)

Using the principal component method with the unrotated factor loadings, this yields:

\[\mathbf{\hat{f}_i} = \left(\begin{array}{c} \frac{1}{\sqrt{\hat{\lambda}_1}}\mathbf{\hat{e}'_1(Y_i-\bar{y})}\\  \frac{1}{\sqrt{\hat{\lambda}_2}}\mathbf{\hat{e}'_2(Y_i-\bar{y})}\\ \vdots \\  \frac{1}{\sqrt{\hat{\lambda}_m}}\mathbf{\hat{e}'_m(Y_i-\bar{y})}\end{array}\right)\]

\(e_i\) through \(e_m\) are our first m eigenvectors.

Weighted Least Squares (Bartlett)

The difference between WLS and OLS is that the squared residuals are divided by the specific variances as shown below. This is going to give more weight, in this estimation, to variables that have low specific variances.  The factor model fits the data best for variables with low specific variances.  The variables with low specific variances should give us more information regarding the true values for the specific factors.

Therefore, for the factor model:

\(\mathbf{Y_i = \boldsymbol{\mu} + Lf_i + \boldsymbol{\epsilon_i}}\)

we want to find \(\boldsymbol{f_i}\) that minimizes

\( \sum\limits_{j=1}^{p}\frac{\epsilon^2_{ij}}{\Psi_j} = \sum\limits_{j=1}^{p}\frac{(y_{ij}-\mu_j - l_{j1}f_1 - l_{j2}f_2 -\dots - l_{jm}f_m)^2}{\Psi} = \mathbf{(Y_i-\boldsymbol{\mu}-Lf_i)'\Psi^{-1}(Y_i-\boldsymbol{\mu}-Lf_i)}\)

The solution is given by this expression where \(\mathbf{\Psi}\) is the diagonal matrix whose diagonal elements are equal to the specific variances:

\(\mathbf{\hat{f}_i = (L'\Psi^{-1}L)^{-1}L'\Psi^{-1}(Y_i-\boldsymbol{\mu})}\)

and can be estimated by substituting the following:

\(\mathbf{\hat{f}_i = (\hat{L}'\hat{\Psi}^{-1}\hat{L})^{-1}\hat{L}'\hat{\Psi}^{-1}(Y_i-\bar{y})}\)

Regression Method

This method is used for maximum likelihood estimates of factor loadings. A vector of the observed data, supplemented by the vector of factor loadings for the i th subject, is considered.

The joint distribution of the data \(\boldsymbol{Y}_i\) and the factor \(\boldsymbol{f}_i\) is

\(\left(\begin{array}{c}\mathbf{Y_i} \\ \mathbf{f_i}\end{array}\right) \sim N \left[\left(\begin{array}{c}\mathbf{\boldsymbol{\mu}} \\ 0 \end{array}\right), \left(\begin{array}{cc}\mathbf{LL'+\Psi} & \mathbf{L} \\ \mathbf{L'} & \mathbf{I}\end{array}\right)\right]\)

Using this we can calculate the conditional expectation of the common factor score \(\boldsymbol{f}_i\) given the data \(\boldsymbol{Y}_i\) as expressed here:

\(E(\mathbf{f_i|Y_i}) = \mathbf{L'(LL'+\Psi)^{-1}(Y_i-\boldsymbol{\mu})}\)

This suggests the following estimator by substituting in the estimates for L and \(\mathbf{\Psi}\):

\(\mathbf{\hat{f}_i = \hat{L}'\left(\hat{L}\hat{L}'+\hat{\Psi}\right)^{-1}(Y_i-\bar{y})}\)

There is a little bit of a fix that often takes place to reduce the effects of incorrect determination of the number of factors. This tends to give you results that are a bit more stable.

\(\mathbf{\tilde{f}_i = \hat{L}'S^{-1}(Y_i-\bar{y})}\)

12.13 - Summary

In this lesson we learned about:

  • The interpretation of factor loadings;
  • The principal component and maximum likelihood methods for estimating factor loadings and specific variances
  • How communalities can be used to assess the adequacy of a factor model
  • A likelihood ratio test for the goodness-of-fit of a factor model
  • Factor rotation
  • Methods for estimating common factors

factor analysis

Factor Analysis:

Mar 10, 2019

1.02k likes | 1.63k Views

Factor Analysis:. A Brief Synopsis of Factor Analytic Methods With an Emphasis on Nonmathematical Aspects. Factor Analytic Methods. Factor analysis is a set of mathematical techniques used to identify dimensions underlying a set of empirical measurements. Factor Analytic Methods.

Share Presentation

  • specific variables
  • observed variables
  • specific variable
  • factors specific variables
  • underlying variables kim meuller

abel-holloway

Presentation Transcript

Factor Analysis: A Brief Synopsis of Factor Analytic Methods With an Emphasis on Nonmathematical Aspects. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Factor Analytic Methods • Factor analysis is a set of mathematical techniques used to identify dimensions underlying a set of empirical measurements. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Factor Analytic Methods • Factor analysis is a set of mathematical techniques used to identify dimensions underlying a set of empirical measurements. • It is a data reduction method in which several sets of scores (units) and the correlations between them are mathematically considered. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Factor Analytic Methods • It is an extremely complex procedure that contains numerous, inherent nuances and variety of correlational analyses designed to examine interrelationships among variables; a basic understanding of geometry, algebra, trigonometry and matrix algebra is required. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Fundamental Purposes • Factor analytic methods can help scientists to define their variables more precisely and decide what variables they should study and relate to each other in the attempt to develop their science to a higher level (Comrey & Lee, 1992) Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Fundamental Purposes • …the aim is to summarize the interrelationships among the variables in a concise but accurate manner as an aid in conceptualization (Gorsuch, 1983). Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Fundamental Purposes • …a statistical technique applied to a single set of variables when the researcher is interested in discovering which variables in the set form coherent subsets that are relatively independent of one another (Tabachnick & Fidell, 2001). Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Fundamental Purposes • …a statistical technique applied to a single set of variables when the researcher is interested in discovering which variable in the set form coherent subsets that are relatively independent of one another (Tabachnick & Fidell, 2001). • …reducing numerous variables down to a few factors (Tabachnick & Fidell, 2001). Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Fundamental Purposes • All scientists attempt to identify the basic underlying dimensions that can be used to account for the phenomena they study. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Fundamental Purposes • All scientists attempt to identify the basic underlying dimensions that can be used to account for the phenomena they study. • Scientists analyze the relationships among a set of variables where these relationships are evaluated across a set of individuals under specific conditions. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Fundamental Purposes • …is to account for the intercorrelations among n variables, by postulating a set of common factors, fewer in number than the number, n, of these variables (Cureton & D’Agostino, 1983). Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Fundamental Purposes • In other words, factor analytic methods assist the researcher in gaining a more comprehensive understanding and conceptualization of complex and poorly defined interrelationships that exist in a large number of imprecisely measured variables. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Goals and Objectives • To summarize patterns of correlations (in matrix) among observed variables. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Goals and Objectives • To summarize patterns of correlations (in matrix) among observed variables. • To reduce a large number of observed variables to a smaller number of factors. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Goals and Objectives • To summarize patterns of correlations (in matrix) among observed variables. • To reduce a large number of observed variables to a smaller number of factors. • To provide an operational definition (a regression equation) for a process underlying observed variables. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Goals and Objectives • To summarize patterns of correlations (in matrix) among observed variables. • To reduce a large number of observed variables to a smaller number of factors. • To provide an operational definition (a regression equation) for an underlying process of observed variables. • To test a theory of underlying processes. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Variables – the characteristics being measured and can be anything that can be objectively measured or scored. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Variables – the characteristics being measured and can be anything that can be objectively measured or scored. • Individuals – the units that provide the data by which the relationships among the variables are evaluated (subjects, cases, etc.) Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Conditions – that which pertains to all the data collected and sets the study apart from other similar studies (time, space, treatments, scoring variations, etc.). Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Conditions – that which pertains to all the data collected and sets the study apart from other similar studies (time, space, treatments, scoring variations, etc.). • Observations – a specific variable score of a specific individual under the designated conditions. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Factors • Hypothetical constructs or theories that help interpret the consistency in a data set (Tinsley & Tinsley, 1987). • A dimension or construct that is a condensed statement of the relationship between a set of variables (Kline, 1994). • Hypothesized, unmeasured, and underlying variables (Kim & Meuller, 1978). Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Factors – specific variables that are presumed to influence or explain phenomenon (i.e., test performance); reflect underlying processes or constructs that have created the correlations among variables. • Sometimes referred to as “latent variables.” Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Common Factors • Represent the dimensions that all the measures have in common. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Common Factors • Represent the dimensions that all the measures have in common. • Specific Factors • Are related to a specific variables but are not common to any other variables. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Common Factors • Represent the dimensions that all the variables have in common. • Specific Factors • Are related to a specific variable but are not common to any other variables. • Error Factors • Represent the error of measurement or unreliability of a variable. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Factor Loading – the farther the loading on a factor from zero, the more one can generalize from that factor to the variable; reflects a quantitative relationship. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Factor Loading – the farther the loading on a factor from zero, the more on can generalize from that factor to the variable; reflects a quantitative relationship. • The extent to which the variables are related to the hypothetical factor. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Factor Loading – the farther the loading on a factor from zero, the more on can generalize from that factor to the variable; reflects a quantitative relationship. • The extent to which the variables are related to the hypothetical factor. • May be thought of as correlations between the variables and the factor. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Factor Loading – the farther the loading on a factor from zero, the more on can generalize from that factor to the variable; reflects a quantitative relationship. • The extent to which the variables are related to the hypothetical factor. • May be thought of as correlations between the variables and the factor. • Sometimes referred to as “saturation.” Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Observed Correlation Matrix – matrix of observed variables (i.e., standard test score). Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Observed Correlation Matrix – matrix of observed variables (i.e., standard test score). • Reproduced Correlation Matrix – matrix produced by the factor model. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Observed Correlation Matrix – matrix of observed variables (i.e., standard test score). • Reproduced Correlation Matrix – matrix produced by the factors. • Residual Correlation Matrix – matrix produced by the differences between observed and model matrices. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Rotation – is a process by which the solution is made more interpretable without changing its underlying mathematical properties. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Rotation – is a process by which the solution is made more interpretable without changing its underlying mathematical properties. • Orthogonal rotation – all factors are uncorrelated with each other. • Produces loading & factor-score matrices. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Required Parlance / Lexicon • Rotation – is a process by which the solution is made more interpretable without changing its underlying mathematical properties. • Orthogonal rotation – all factors are uncorrelated with each other. • Produces loading & factor-score matrices. • Oblique rotation – factors are correlated. • Produces structure, pattern, & factor-score matrices. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Uses of Factor Analysis • Finding underlying factors of ability tests. • Identify personality dimensions. • Identifying clinical syndromes. • Finding dimensions of satisfaction. • Finding dimensions of social behaviors. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Uses of Factor Analysis • In psychology - the development of objective tests and assessments for the measurement of personality and intelligence. • Explain inter-correlations. • Test theory about factor constructs. • Determine effect of variation / changes. • Verify previous findings. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Types of Factor Analysis • Exploratory (EFA) – the researcher attempts to describe and summarize data by grouping together variables that are correlated. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Types of Factor Analysis • Exploratory (EFA) – the researcher attempts to describe and summarize data by grouping together variables that are correlated. • The variables may or may not have been chosen with potential underlying processes in mind. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Types of Factor Analysis • Exploratory (EFA) – the researcher attempts to describe and summarize data by grouping together variables that are correlated. • The variables may or may not have been chosen with potential underlying processes in mind. • Used in the early stages of research to consolidate variables and generate hypotheses about possible underlying processes or constructs. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Types of Factor Analysis • Confirmatory (CFA) – used later in research (advanced stages) to test a theory regarding latent underlying processes / constructs. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Types of Factor Analysis • Confirmatory (CFA) – used later in research (advanced stages) to test a theory regarding latent underlying processes / constructs. • Variables are specifically chosen to reveal underlying processes / constructs. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Types of Factor Analysis • Confirmatory (CFA) – used later in research (advanced stages) to test a theory regarding latent underlying processes / constructs. • Variables are specifically chosen to reveal or confirm underlying processes / constructs. • Much more sophisticated technique than EFA. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

The Fundamental Equation of Factor Analysis • The first step… zjk = aj1F1k + aj2F2k + … + ajmFmk + ajsSjk + jeEjk Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

The Fundamental Equation of Factor Analysis • Given the limited time available…let’s don’t and say we did. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Factor Analytic Steps and Procedures • 1- Select and measure a set of variables. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Factor Analytic Steps and Procedures • 1- Select and measure a set of variables. • 2- Compute the matrix of correlations among the variables. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Factor Analytic Steps and Procedures • 1- Select and measure a set of variables. • 2- Compute the matrix of correlations among the variables. • 3- Extract a set of unrotated factors. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Factor Analytic Steps and Procedures • 1- Select and measure a set of variables. • 2- Compute the matrix of correlations among the variables. • 3- Extract a set of unrotated factors. • 4- Determine the number of factors. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

Factor Analytic Steps and Procedures • 1- Select and measure a set of variables. • 2- Compute the matrix of correlations among the variables. • 3- Extract a set of unrotated factors. • 4- Determine the number of factors. • 5- Rotate the factors if needed to increase interpretability. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce.

  • More by User

Business Portfolio Analysis

Business Portfolio Analysis

Business Portfolio Analysis. Asia-Pacific Marketing Federation Certified Professional Marketer Copyright Marketing Institute of Singapore. Outline. Introduction BCG (Boston Consulting Group) Matrix PIMS (Profit Impact of Market Strategy) GE(General Electric)/McKinsey Multi-Factor Matrix.

1.86k views • 28 slides

Discriminant Analysis and Classification

Discriminant Analysis and Classification

Discriminant Analysis and Classification. Discriminant Analysis as a Type of MANOVA. The good news about DA is that it is a lot like MANOVA; in fact in the case of a factor with only two levels it is the same thing

1.56k views • 26 slides

Principal Components Analysis

Principal Components Analysis

Principal Components Analysis. Principal components factor analysis. Obtaining a factor solution through principal components analysis is an iterative process that usually requires repeating the SPSS factor analysis procedure a number of times to reach a satisfactory solution.

2.43k views • 70 slides

Notes on Lesson

Notes on Lesson

Notes on Lesson. Failure Theories. Stress in machine components should be accurately computed. Designer must understand material limits to ensure a safe design. Design Factor. Factor of Safety (N)

1.48k views • 108 slides

Vulnerable Plaque

Vulnerable Plaque

Vulnerable Plaque. Simon Redwood St Thomas’ Hospital, London. Definitions of Vulnerable Plaque. Functional Definition A plaque, often non-stenotic , that has a high likelihood of becoming disrupted and forming a thrombogenic factor after exposure to an acute risk factor

1.72k views • 36 slides

BAB 12

BAB 12. PASAR INPUT. Topics to be Discussed. Competitive Factor Markets Equilibrium in a Competitive Factor Market Factor Markets with Monopsony Power Factor Markets with Monopoly Power. Competitive Factor Markets. Characteristics 1) Large number of sellers of the factor of production

1.11k views • 84 slides

Introduction to Factor Analysis

Introduction to Factor Analysis

Introduction to Factor Analysis. Bonnie Halpern-Felsher, Ph.D . Megie Okumura, MD, MAS. Road Map. Definition and purpose of factor analysis (example) Types of factor analysis Considerations when conducting an Exploratory Factor Analysis (EFA) Beyond EFA. What is Factor Analysis?.

1.24k views • 45 slides

Objectives

Objectives. Mention the main steps of gravimetric analysis. Define supersaturation. Identify types of impurities in precipitates. Define peptization. Define gravimetric factor. Apply gravimetric analysis to different samples. Gravimetric analysis.

3.52k views • 31 slides

Power Factor Correction Capacitors

Power Factor Correction Capacitors

Power Factor Correction Capacitors. Selection &amp; Applications Of Power Factor Correction Capacitor For Industrial and Large Commercial Users Ben Banerjee

1.37k views • 61 slides

Full Chip Analysis

Full Chip Analysis

Full Chip Analysis. Chung-Kuan Cheng Computer Science and Engineering Department University of California, San Diego La Jolla, CA 92093-0114 [email protected]. Outlines. Introduction Circuit Level Analysis Logic Level Analysis Timing Analysis Functional Analysis Mixed Signal Analysis

1.03k views • 71 slides

Factor and Component Analysis

Factor and Component Analysis

Factor and Component Analysis. esp. Principal Component Analysis (PCA). Why Factor or Component Analysis?. We study phenomena that can not be directly observed ego, personality, intelligence in psychology Underlying factors that govern the observed data

1.3k views • 99 slides

Unit 6: The FACTOR MARKET

Unit 6: The FACTOR MARKET

Unit 6: The FACTOR MARKET. (aka: The Resource Market or Input Market). Unit 5: The Factor Market. Length: 5-6 Lessons Chapters: 12 &amp; 13 in packet Good News: Only one Graph to learn (PC vs. Monopsony) Application of things we have already learned. Basically just Supply and Demand.

Chapter 14

Chapter 14. Markets for Factor Inputs. Topics to be Discussed. Competitive Factor Markets Equilibrium in a Competitive Factor Market Factor Markets with Monopsony Power Factor Markets with Monopoly Power. Competitive Factor Markets. Characteristics

1.23k views • 85 slides

EXPLORATORY FACTOR ANALYSIS (EFA)

EXPLORATORY FACTOR ANALYSIS (EFA)

EXPLORATORY FACTOR ANALYSIS (EFA). Kalle Lyytinen &amp; James Gaskin. Learning Objectives. Understand what is the factor analysis technique and its applications in research Discuss exploratory factor analysis (EFA) Run EFA with SPSS and interpret the resulted output

1.61k views • 56 slides

MCR-ALS analysis using initial estimate of concentration profile by EFA

MCR-ALS analysis using initial estimate of concentration profile by EFA

MCR-ALS analysis using initial estimate of concentration profile by EFA. Deducing Chemical Rank (Factor Analysis). Scree plot Indicator function Loading plot Eigen-value ratio …. Calculating Initial estimate by EFA. 1.2. 1.2. 1.2. 1.2. 1.2. Source of differences?.

1.18k views • 85 slides

  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

Institute for Digital Research and Education

A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA). Please refer to A Practical Introduction to Factor Analysis: Confirmatory Factor Analysis .

I. Exploratory Factor Analysis

  • Motivating example: The SAQ
  • Pearson correlation formula

Partitioning the variance in factor analysis

  • principal components analysis
  • principal axis factoring
  • maximum likelihood

Simple Structure

  • Orthogonal rotation (Varimax)
  • Oblique (Direct Oblimin)
  • Generating factor scores

Back to Launch Page

Introduction.

Suppose you are conducting a survey and you want to know whether the items in the survey have similar patterns of responses, do these items “hang together” to create a construct? The basic assumption of factor analysis is that for a collection of observed variables there are a set of underlying variables called  factors (smaller than the observed variables), that can explain the interrelationships among those variables. Let’s say you conduct a survey and collect responses about people’s anxiety about using SPSS. Do all these items actually measure what we call “SPSS Anxiety”?

fig01b

Motivating Example: The SAQ (SPSS Anxiety Questionnaire)

Let’s proceed with our hypothetical example of the survey which Andy Field terms the SPSS Anxiety Questionnaire. For simplicity, we will use the so-called “ SAQ-8 ” which consists of the first eight items in the SAQ . Click on the preceding hyperlinks to download the SPSS version of both files. The SAQ-8 consists of the following questions:

  • Statistics makes me cry
  • My friends will think I’m stupid for not being able to cope with SPSS
  • Standard deviations excite me
  • I dream that Pearson is attacking me with correlation coefficients
  • I don’t understand statistics
  • I have little experience of computers
  • All computers hate me
  • I have never been good at mathematics

Pearson Correlation of the SAQ-8

Let’s get the table of correlations in SPSS Analyze – Correlate – Bivariate:

Correlations
1 2 3 4 5 6 7 8
1 1
2 -.099 1
3 -.337 .318 1
4 .436 -.112 -.380 1
5 .402 -.119 -.310 .401 1
6 .217 -.074 -.227 .278 .257 1
7 .305 -.159 .409 .339 1
8 .331 -.050 -.259 .349 .269 .223 .297 1
**. Correlation is significant at the 0.01 level (2-tailed).
*. Correlation is significant at the 0.05 level (2-tailed).

From this table we can see that most items have some correlation with each other ranging from \(r=-0.382\) for Items 3 and 7 to \(r=.514\) for Items 6 and 7. Due to relatively high correlations among items, this would be a good candidate for factor analysis. Recall that the goal of factor analysis is to model the interrelationships between items with fewer (latent) variables. These interrelationships can be broken up into multiple components

Since the goal of factor analysis is to model the interrelationships among items, we focus primarily on the variance and covariance rather than the mean. Factor analysis assumes that variance can be partitioned into two types of variance, common and unique

  • Communality (also called \(h^2\)) is a definition of common variance that ranges between \(0 \) and \(1\). Values closer to 1 suggest that extracted factors explain more of the variance of an individual item.
  • Specific variance : is variance that is specific to a particular item (e.g., Item 4 “All computers hate me” may have variance that is attributable to anxiety about computers in addition to anxiety about SPSS).
  • Error variance:  comes from errors of measurement and basically anything unexplained by common or specific variance (e.g., the person got a call from her babysitter that her two-year old son ate her favorite lipstick).

The figure below shows how these concepts are related:

fig02d

Performing Factor Analysis

As a data analyst, the goal of a factor analysis is to reduce the number of variables to explain and to interpret the results. This can be accomplished in two steps:

  • factor extraction
  • factor rotation

Factor extraction involves making a choice about the type of model as well the number of factors to extract. Factor rotation comes after the factors are extracted, with the goal of achieving  simple structure  in order to improve interpretability.

Extracting Factors

There are two approaches to factor extraction which stems from different approaches to variance partitioning: a) principal components analysis and b) common factor analysis.

Principal Components Analysis

Unlike factor analysis, principal components analysis or PCA makes the assumption that there is no unique variance, the total variance is equal to common variance. Recall that variance can be partitioned into common and unique variance. If there is no unique variance then common variance takes up total variance (see figure below). Additionally, if the total variance is 1, then the common variance is equal to the communality.

Running a PCA with 8 components in SPSS

The goal of a PCA is to replicate the correlation matrix using a set of components that are fewer in number and linear combinations of the original set of items. Although the following analysis defeats the purpose of doing a PCA we will begin by extracting as many components as possible as a teaching exercise and so that we can decide on the optimal number of components to extract later.

First go to Analyze – Dimension Reduction – Factor. Move all the observed variables over the Variables: box to be analyze.

fig4-2a

Under Extraction – Method, pick Principal components and make sure to Analyze the Correlation matrix. We also request the Unrotated factor solution and the Scree plot. Under Extract, choose Fixed number of factors, and under Factor to extract enter 8. We also bumped up the Maximum Iterations of Convergence to 100.

fig4-2b4

The equivalent SPSS syntax is shown below:

Eigenvalues and Eigenvectors

Before we get into the SPSS output, let’s understand a few things about eigenvalues and eigenvectors.

Eigenvalues represent the total amount of variance that can be explained by a given principal component.  They can be positive or negative in theory, but in practice they explain variance which is always positive.

  • If eigenvalues are greater than zero, then it’s a good sign.
  • Since variance cannot be negative, negative eigenvalues imply the model is ill-conditioned.
  • Eigenvalues close to zero imply there is item multicollinearity, since all the variance can be taken up by the first component.

Eigenvalues are also the sum of squared component loadings across all items for each component, which represent the amount of variance in each item that can be explained by the principal component.

Eigenvectors represent a weight for each eigenvalue. The eigenvector times the square root of the eigenvalue gives the component loadings  which can be interpreted as the correlation of each item with the principal component. For this particular PCA of the SAQ-8, the  eigenvector associated with Item 1 on the first component is \(0.377\), and the eigenvalue of Item 1 is \(3.057\). We can calculate the first component as

$$(0.377)\sqrt{3.057}= 0.659.$$

In this case, we can say that the correlation of the first item with the first component is \(0.659\). Let’s now move on to the component matrix.

Component Matrix

The components can be interpreted as the correlation of each item with the component. Each item has a loading corresponding to each of the 8 components. For example, Item 1 is correlated \(0.659\) with the first component, \(0.136\) with the second component and \(-0.398\) with the third, and so on.

The square of each loading represents the proportion of variance (think of it as an \(R^2\) statistic) explained by a particular component. For Item 1, \((0.659)^2=0.434\) or \(43.4\%\) of its variance is explained by the first component. Subsequently, \((0.136)^2 = 0.018\) or \(1.8\%\) of the variance in Item 1 is explained by the second component. The total variance explained by both components is thus \(43.4\%+1.8\%=45.2\%\). If you keep going on adding the squared loadings cumulatively down the components, you find that it sums to 1 or 100%. This is also known as the communality , and in a PCA the communality for each item is equal to the total variance.

Component Matrix
Item Component
1 2 3 4 5 6 7 8
1 0.659 0.136 -0.398 0.160 -0.064 0.568 -0.177 0.068
2 -0.300 0.866 -0.025 0.092 -0.290 -0.170 -0.193 -0.001
3 -0.653 0.409 0.081 0.064 0.410 0.254 0.378 0.142
4 0.720 0.119 -0.192 0.064 -0.288 -0.089 0.563 -0.137
5 0.650 0.096 -0.215 0.460 0.443 -0.326 -0.092 -0.010
6 0.572 0.185 0.675 0.031 0.107 0.176 -0.058 -0.369
7 0.718 0.044 0.453 -0.006 -0.090 -0.051 0.025 0.516
8 0.568 0.267 -0.221 -0.694 0.258 -0.084 -0.043 -0.012
Extraction Method: Principal Component Analysis.
a. 8 components extracted.

Summing the squared component loadings across the components (columns) gives you the communality estimates for each item, and summing each squared loading down the items (rows) gives you the eigenvalue for each component. For example, to obtain the first eigenvalue we calculate:

$$(0.659)^2 +  (-.300)^2 – (-0.653)^2 + (0.720)^2 + (0.650)^2 + (0.572)^2 + (0.718)^2 + (0.568)^2 = 3.057$$

You will get eight eigenvalues for eight components, which leads us to the next table.

Total Variance Explained in the 8-component PCA

Recall that the eigenvalue represents the total amount of variance that can be explained by a given principal component. Starting from the first component, each subsequent component is obtained from partialling out the previous component. Therefore the first component explains the most variance, and the last component explains the least. Looking at the Total Variance Explained table, you will get the total variance explained by each component. For example, Component 1 is \(3.057\), or \((3.057/8)\% = 38.21\%\) of the total variance. Because we extracted the same number of components as the number of items, the Initial Eigenvalues column is the same as the Extraction Sums of Squared Loadings column.

Total Variance Explained
Component Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 3.057 38.206 38.206
2 1.067 13.336 51.543 1.067 13.336 51.543
3 0.958 11.980 63.523 0.958 11.980 63.523
4 0.736 9.205 72.728 0.736 9.205 72.728
5 0.622 7.770 80.498 0.622 7.770 80.498
6 0.571 7.135 87.632 0.571 7.135 87.632
7 0.543 6.788 94.420 0.543 6.788 94.420
8 0.446 5.580 100.000 0.446 5.580 100.000
Extraction Method: Principal Component Analysis.

Choosing the number of components to extract

Since the goal of running a PCA is to reduce our set of variables down, it would useful to have a criterion for selecting the optimal number of components that are of course smaller than the total number of items. One criterion is the choose components that have eigenvalues greater than 1. Under the Total Variance Explained table, we see the first two components have an eigenvalue greater than 1. This can be confirmed by the Scree Plot which plots the eigenvalue (total variance explained) by the component number. Recall that we checked the Scree Plot option under Extraction – Display, so the scree plot should be produced automatically.

fig4-2d

The first component will always have the highest total variance and the last component will always have the least, but where do we see the largest drop? If you look at Component 2, you will see an “elbow” joint. This is the marking point where it’s perhaps not too beneficial to continue further component extraction. There are some conflicting definitions of the interpretation of the scree plot but some say to take the number of components to the left of the the “elbow”. Following this criteria we would pick only one component. A more subjective interpretation of the scree plots suggests that any number of components between 1 and 4 would be plausible and further corroborative evidence would be helpful.

Some criteria say that the total variance explained by all components should be between 70% to 80% variance, which in this case would mean about four to five components. The authors of the book say that this may be untenable for social science research where extracted factors usually explain only 50% to 60%. Picking the number of components is a bit of an art and requires input from the whole research team. Let’s suppose we talked to the principal investigator and she believes that the two component solution makes sense for the study, so we will proceed with the analysis.

Running a PCA with 2 components in SPSS

Running the two component PCA is just as easy as running the 8 component solution. The only difference is under Fixed number of factors – Factors to extract you enter 2.

fig06

We will focus the differences in the output between the eight and two-component solution. Under Total Variance Explained, we see that the Initial Eigenvalues no longer equals the Extraction Sums of Squared Loadings. The main difference is that there are only two rows of eigenvalues, and the cumulative percent variance goes up to \(51.54\%\).

Total Variance Explained
Component Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 3.057 38.206 38.206
2 1.067 13.336 51.543 1.067 13.336 51.543
3 0.958 11.980 63.523
4 0.736 9.205 72.728
5 0.622 7.770 80.498
6 0.571 7.135 87.632
7 0.543 6.788 94.420
8 0.446 5.580 100.000
Extraction Method: Principal Component Analysis.

Similarly, you will see that the Component Matrix has the same loadings as the eight-component solution but instead of eight columns it’s now two columns.

Component Matrix
Item Component
1 2
1 0.659 0.136
2 -0.300 0.866
3 -0.653 0.409
4 0.720 0.119
5 0.650 0.096
6 0.572 0.185
7 0.718 0.044
8 0.568 0.267
Extraction Method: Principal Component Analysis.
a. 2 components extracted.

Again, we interpret Item 1 as having a correlation of 0.659 with Component 1. From glancing at the solution, we see that Item 4 has the highest correlation with Component 1 and Item 2 the lowest. Similarly, we see that Item 2 has the highest correlation with Component 2 and Item 7 the lowest.

Quick check:

True or False

  • The elements of the Component Matrix are correlations of the item with each component.
  • The sum of the squared eigenvalues is the proportion of variance under Total Variance Explained.
  • The Component Matrix can be thought of as correlations and the Total Variance Explained table can be thought of as \(R^2\).

1.T, 2.F (sum of squared loadings), 3. T

Communalities of the 2-component PCA

The communality is the sum of the squared component loadings up to the number of components you extract. In the SPSS output you will see a table of communalities.

Communalities
Initial Extraction
1 1.000 0.453
2 1.000 0.840
3 1.000 0.594
4 1.000 0.532
5 1.000 0.431
6 1.000 0.361
7 1.000 0.517
8 1.000 0.394
Extraction Method: Principal Component Analysis.

Since PCA is an iterative estimation process, it starts with 1 as an initial estimate of the communality (since this is the total variance across all 8 components), and then proceeds with the analysis until a final communality extracted. Notice that the Extraction column is smaller Initial column because we only extracted two components. As an exercise, let’s manually calculate the first communality from the Component Matrix. The first ordered pair is \((0.659,0.136)\) which represents the correlation of the first item with Component 1 and Component 2. Recall that squaring the loadings and summing down the components (columns) gives us the communality:

$$h^2_1 = (0.659)^2 + (0.136)^2 = 0.453$$

Going back to the Communalities table, if you sum down all 8 items (rows) of the Extraction column, you get \(4.123\). If you go back to the Total Variance Explained table and summed the first two eigenvalues you also get \(3.057+1.067=4.124\). Is that surprising? Basically it’s saying that the summing the communalities across all items is the same as summing the eigenvalues across all components.

1. In a PCA, when would the communality for the Initial column be equal to the Extraction column?

Answer : When you run an 8-component PCA.

  • The eigenvalue represents the communality for each item.
  • For a single component, the sum of squared component loadings across all items represents the eigenvalue for that component.
  • The sum of eigenvalues for all the components is the total variance.
  • The sum of the communalities down the components is equal to the sum of eigenvalues down the items.

1. F, the eigenvalue is the total communality across all items for a single component, 2. T, 3. T, 4. F (you can only sum communalities across items, and sum eigenvalues across components, but if you do that they are equal).

Common Factor Analysis

The partitioning of variance differentiates a principal components analysis from what we call common factor analysis. Both methods try to reduce the dimensionality of the dataset down to fewer unobserved variables, but whereas PCA assumes that there common variances takes up all of total variance, common factor analysis assumes that total variance can be partitioned into common and unique variance. It is usually more reasonable to assume that you have not measured your set of items perfectly. The unobserved or latent variable that makes up common variance is called a factor , hence the name factor analysis. The other main difference between PCA and factor analysis lies in the goal of your analysis. If your goal is to simply reduce your variable list down into a linear combination of smaller components then PCA is the way to go. However, if you believe there is some latent construct that defines the interrelationship among items, then factor analysis may be more appropriate. In this case, we assume that there is a construct called SPSS Anxiety that explains why you see a correlation among all the items on the SAQ-8, we acknowledge however that SPSS Anxiety cannot explain all the shared variance among items in the SAQ, so we model the unique variance as well. Based on the results of the PCA, we will start with a two factor extraction.

Running a Common Factor Analysis with 2 factors in SPSS

To run a factor analysis, use the same steps as running a PCA (Analyze – Dimension Reduction – Factor) except under Method choose Principal axis factoring. Note that we continue to set Maximum Iterations for Convergence at 100 and we will see why later.

fig07

Pasting the syntax into the SPSS Syntax Editor we get:

Note the main difference is under /EXTRACTION we list PAF for Principal Axis Factoring instead of PC for Principal Components. We will get three tables of output, Communalities, Total Variance Explained and Factor Matrix. Let’s go over each of these and compare them to the PCA output.

Communalities of the 2-factor PAF

Communalities
Item Initial Extraction
1 0.293 0.437
2 0.106 0.052
3 0.298 0.319
4 0.344 0.460
5 0.263 0.344
6 0.277 0.309
7 0.393 0.851
8 0.192 0.236
Extraction Method: Principal Axis Factoring.

The most striking difference between this communalities table and the one from the PCA is that the initial extraction is no longer one. Recall that for a PCA, we assume the total variance is completely taken up by the common variance or communality, and therefore we pick 1 as our best initial guess. What principal axis factoring does is instead of guessing 1 as the initial communality, it chooses the squared multiple correlation coefficient \(R^2\). To see this in action for Item 1  run a linear regression where Item 1 is the dependent variable and Items 2 -8 are independent variables. Go to Analyze – Regression – Linear and enter q01 under Dependent and q02 to q08 under Independent(s).

fig08

Pasting the syntax into the Syntax Editor gives us:

The output we obtain from this analysis is

Model Summary
Model R R Square Adjusted R Square Std. Error of the Estimate
1 .541 0.293 0.291 0.697

Note that 0.293 (highlighted in red) matches the initial communality estimate for Item 1. We can do eight more linear regressions in order to get all eight communality estimates but SPSS already does that for us. Like PCA,  factor analysis also uses an iterative estimation process to obtain the final estimates under the Extraction column. Finally, summing all the rows of the extraction column, and we get 3.00. This represents the total common variance shared among all items for a two factor solution.

Total Variance Explained (2-factor PAF)

The next table we will look at is Total Variance Explained. Comparing this to the table from the PCA we notice that the Initial Eigenvalues are exactly the same and includes 8 rows for each “factor”. In fact, SPSS simply borrows the information from the PCA analysis for use in the factor analysis and the factors are actually components in the Initial Eigenvalues column. The main difference now is in the Extraction Sums of Squares Loadings. We notice that each corresponding row in the Extraction column is lower than the Initial column. This is expected because we assume that total variance can be partitioned into common and unique variance, which means the common variance explained will be lower. Factor 1 explains 31.38% of the variance whereas Factor 2 explains 6.24% of the variance. Just as in PCA the more factors you extract, the less variance explained by each successive factor.

Total Variance Explained
Factor Initial Eigenvalues Extraction Sums of Squared Loadings
Total % of Variance Cumulative % Total % of Variance Cumulative %
1 3.057 38.206 38.206 2.511 31.382 31.382
2 1.067 13.336 51.543 0.499 6.238 37.621
3 0.958 11.980 63.523
4 0.736 9.205 72.728
5 0.622 7.770 80.498
6 0.571 7.135 87.632
7 0.543 6.788 94.420
8 0.446 5.580 100.000
Extraction Method: Principal Axis Factoring.

A subtle note that may be easily overlooked is that when SPSS plots the scree plot or the Eigenvalues greater than 1 criteria (Analyze – Dimension Reduction – Factor – Extraction), it bases it off the Initial and not the Extraction solution. This is important because the criteria here assumes no unique variance as in PCA, which means that this is the total variance explained not accounting for specific or measurement error. Note that in the Extraction of Sums Squared Loadings column the second factor has an eigenvalue that is less than 1 but is still retained because the Initial value is 1.067. If you want to use this criteria for the common variance explained you would need to modify the criteria yourself.

fig09

  • In theory, when would the percent of variance in the Initial column ever equal the Extraction column?
  • True or False, in SPSS when you use the Principal Axis Factor method the scree plot uses the final factor analysis solution to plot the eigenvalues.

Answers: 1. When there is no unique variance (PCA assumes this whereas common factor analysis does not, so this is in theory and not in practice), 2. F, it uses the initial PCA solution and the eigenvalues assume no unique variance.

Factor Matrix (2-factor PAF)

Factor Matrix
Item Factor
1 2
1 0.588 -0.303
2 -0.227 0.020
3 -0.557 0.094
4 0.652 -0.189
5 0.560 -0.174
6 0.498 0.247
7 0.771 0.506
8 0.470 -0.124
Extraction Method: Principal Axis Factoring.
a. 2 factors extracted. 79 iterations required.

First note the annotation that 79 iterations were required. If we had simply used the default 25 iterations in SPSS, we would not have obtained an optimal solution. This is why in practice it’s always good to increase the maximum number of iterations. Now let’s get into the table itself. The elements of the Factor Matrix table are called loadings and represent the correlation of each item with the corresponding factor. Just as in PCA, squaring each loading and summing down the items (rows) gives the total variance explained by each factor. Note that they are no longer called eigenvalues as in PCA. Let’s calculate this for Factor 1:

$$(0.588)^2 +  (-0.227)^2 + (-0.557)^2 + (0.652)^2 + (0.560)^2 + (0.498)^2 + (0.771)^2 + (0.470)^2 = 2.51$$

This number matches the first row under the Extraction column of the Total Variance Explained table. We can repeat this for Factor 2 and get matching results for the second row. Additionally, we can get the communality estimates by summing the squared loadings across the factors (columns) for each item. For example, for Item 1:

$$(0.588)^2 +  (-0.303)^2 = 0.437$$

Note that these results match the value of the Communalities table for Item 1 under the Extraction column. This means that the sum of squared loadings across factors represents the communality estimates for each item.

The relationship between the three tables

To see the relationships among the three tables let’s first start from the Factor Matrix (or Component Matrix in PCA). We will use the term factor to represent components in PCA as well. These elements represent the correlation of the item with each factor. Now, square each element to obtain squared loadings or the proportion of variance explained by each factor for each item. Summing the squared loadings across factors you get the proportion of variance explained by all factors in the model. This is known as common variance or communality, hence the result is the Communalities table. Going back to the Factor Matrix, if you square the loadings and sum down the items you get Sums of Squared Loadings (in PAF) or eigenvalues (in PCA) for each factor. These now become elements of the Total Variance Explained table. Summing down the rows (i.e., summing down the factors) under the Extraction column we get \(2.511 + 0.499 = 3.01\) or the total (common) variance explained. In words, this is the total (common) variance explained by the two factor solution for all eight items. Equivalently, since the Communalities table represents the total common variance explained by both factors for each item, summing down the items in the Communalities table also gives you the total (common) variance explained, in this case

$$ (0.437)^2 + (0.052)^2 + (0.319)^2 + (0.460)^2 + (0.344)^2 + (0.309)^2 + (0.851)^2 + (0.236)^2 = 3.01$$

which is the same result we obtained from the Total Variance Explained table. Here is a table that that may help clarify what we’ve talked about:

fig12b

In summary:

  • Squaring the elements in the Factor Matrix gives you the squared loadings
  • Summing the squared loadings of the Factor Matrix across the factors gives you the communality estimates for each item in the Extraction column of the Communalities table.
  • Summing the squared loadings of the Factor Matrix down the items gives you the Sums of Squared Loadings (PAF) or eigenvalue (PCA) for each factor across all items.
  • Summing the eigenvalues or Sums of Squared Loadings in the Total Variance Explained table gives you the total common variance explained.
  • Summing down all items of the Communalities table is the same as summing the eigenvalues or Sums of Squared Loadings down all factors under the Extraction column of the Total Variance Explained table.

True or False (the following assumes a two-factor Principal Axis Factor solution with 8 items)

  • The elements of the Factor Matrix represent correlations of each item with a factor.
  • Each squared element of Item 1 in the Factor Matrix represents the communality.
  • Summing the squared elements of the Factor Matrix down all 8 items within Factor 1 equals the first Sums of Squared Loading under the Extraction column of Total Variance Explained table.
  • Summing down all 8 items in the Extraction column of the Communalities table gives us the total common variance explained by both factors.
  • The total common variance explained is obtained by summing all Sums of Squared Loadings of the Initial column of the Total Variance Explained table
  • The total Sums of Squared Loadings in the Extraction column under the Total Variance Explained table represents the total variance which consists of total common variance plus unique variance.
  • In common factor analysis, the sum of squared loadings is the eigenvalue.

Answers: 1. T, 2. F, the sum of the squared elements across both factors, 3. T, 4. T, 5. F, sum all eigenvalues from the Extraction column of the Total Variance Explained table, 6. F, the total Sums of Squared Loadings represents only the total common variance excluding unique variance, 7. F, eigenvalues are only applicable for PCA.

Maximum Likelihood Estimation (2-factor ML)

Since this is a non-technical introduction to factor analysis, we won’t go into detail about the differences between Principal Axis Factoring (PAF) and Maximum Likelihood (ML). The main concept to know is that ML also assumes a common factor analysis using the \(R^2\) to obtain initial estimates of the communalities, but uses a different iterative process to obtain the extraction solution. To run a factor analysis using maximum likelihood estimation under Analyze – Dimension Reduction – Factor – Extraction – Method choose Maximum Likelihood.

fig10

Although the initial communalities are the same between PAF and ML, the final extraction loadings will be different, which means you will have different Communalities, Total Variance Explained, and Factor Matrix tables (although Initial columns will overlap). The other main difference is that you will obtain a Goodness-of-fit Test table, which gives you a absolute test of model fit. Non-significant values suggest a good fitting model. Here the p -value is less than 0.05 so we reject the two-factor model.

Goodness-of-fit Test
Chi-Square df Sig.
198.617 13 0.000

In practice, you would obtain chi-square values for multiple factor analysis runs, which we tabulate below from 1 to 8 factors. The table shows the number of factors extracted (or attempted to extract) as well as the chi-square, degrees of freedom, p-value and iterations needed to converge. Note that as you increase the number of factors, the chi-square value and degrees of freedom decreases but the iterations needed and p-value increases. Practically, you want to make sure the number of iterations you specify exceeds the iterations needed. Additionally, NS means no solution and N/A means not applicable. In SPSS, no solution is obtained when you run 5 to 7 factors because the degrees of freedom is negative (which cannot happen). For the eight factor solution, it is not even applicable in SPSS because it will spew out a warning that “You cannot request as many factors as variables with any extraction method except PC. The number of factors will be reduced by one.” This means that if you try to extract an eight factor solution for the SAQ-8, it will default back to the 7 factor solution. Now that we understand the table, let’s see if we can find the threshold at which the absolute fit indicates a good fitting model. It looks like here that the p -value becomes non-significant at a 3 factor solution. Note that differs from the eigenvalues greater than 1 criteria which chose 2 factors and using Percent of Variance explained you would choose 4-5 factors. We talk to the Principal Investigator and at this point, we still prefer the two-factor solution. Note that there is no “right” answer in picking the best factor model, only what makes sense for your theory. We will talk about interpreting the factor loadings when we talk about factor rotation to further guide us in choosing the correct number of factors.

Number of Factors Chi-square Df -value Iterations needed
1 553.08 20 <0.05 4
2 198.62 13 < 0.05 39
3 13.81 7 0.055 57
4 1.386 2 0.5 168
5 NS -2 NS NS
6 NS -5 NS NS
7 NS -7 NS NS
8 N/A N/A N/A N/A
  • The Initial column of the Communalities table for the Principal Axis Factoring and the Maximum Likelihood method are the same given the same analysis.
  • Since they are both factor analysis methods, Principal Axis Factoring and the Maximum Likelihood method will result in the same Factor Matrix.
  • In SPSS, both Principal Axis Factoring and Maximum Likelihood methods give chi-square goodness of fit tests.
  • You can extract as many factors as there are items as when using ML or PAF.
  • When looking at the Goodness-of-fit Test table, a p -value less than 0.05 means the model is a good fitting model.
  • In the Goodness-of-fit Test table, the lower the degrees of freedom the more factors you are fitting.

Answers: 1. T, 2. F, the two use the same starting communalities but a different estimation process to obtain extraction loadings, 3. F, only Maximum Likelihood gives you chi-square values, 4. F, you can extract as many components as items in PCA, but SPSS will only extract up to the total number of items minus 1, 5. F, greater than 0.05, 6. T, we are taking away degrees of freedom but extracting more factors.

Comparing Common Factor Analysis versus Principal Components

As we mentioned before, the main difference between common factor analysis and principal components is that factor analysis assumes total variance can be partitioned into common and unique variance, whereas principal components assumes common variance takes up all of total variance (i.e., no unique variance). For both methods, when you assume total variance is 1, the common variance becomes the communality. The communality is unique to each item, so if you have 8 items, you will obtain 8 communalities; and it represents the common variance explained by the factors or components. However in the case of principal components, the communality is the total variance of each item, and summing all 8 communalities gives you the total variance across all items. In contrast, common factor analysis assumes that the communality is a portion of the total variance, so that summing up the communalities represents the total common variance and not the total variance. In summary, for PCA, total common variance is equal to total variance explained , which in turn is equal to the total variance, but in common factor analysis, total common variance is equal to total variance explained but does not equal total variance.

fig11c

The following applies to the SAQ-8 when theoretically extracting 8 components or factors for 8 items:

  • For each item, when the total variance is 1, the common variance becomes the communality.
  • In principal components, each communality represents the total variance across all 8 items.
  • In common factor analysis, the communality represents the common variance for each item.
  • The communality is unique to each factor or component.
  • For both PCA and common factor analysis, the sum of the communalities represent the total variance explained.
  • For PCA, the total variance explained equals the total variance, but for common factor analysis it does not.

Answers: 1. T, 2. F, the total variance for each item, 3. T, 4. F, communality is unique to each item (shared across components or factors), 5. T, 6. T.

Rotation Methods

After deciding on the number of factors to extract and with analysis model to use, the next step is to interpret the factor loadings. Factor rotations help us interpret factor loadings. There are two general types of rotations, orthogonal and oblique.

  • orthogonal rotation assume factors are independent or uncorrelated with each other
  • oblique rotation factors are not independent and are correlated

The goal of factor rotation is to improve the interpretability of the factor solution by reaching simple structure. 

Simple structure

Without rotation, the first factor is the most general factor onto which most items load and explains the largest amount of variance. This may not be desired in all cases. Suppose you wanted to know how well a set of items load on each  factor; simple structure helps us to achieve this.

The definition of simple structure is that in a factor loading matrix:

  • Each row should contain at least one zero.
  • For m factors, each column should have at least m zeroes (e.g., three factors, at least 3 zeroes per factor).

For every pair of factors (columns),

  • there should be several items for which entries approach zero in one column but large loadings on the other.
  • a large proportion of items should have entries approaching zero.
  • only a small number of items have two non-zero entries.

The following table is an example of simple structure with three factors:

Item Factor 1 Factor 2 Factor 3
1 0.8 0 0
2 0.8 0 0
3 0.8 0 0
4 0 0.8 0
5 0 0.8 0
6 0 0.8 0
7 0 0 0.8
8 0 0 0.8

Let’s go down the checklist to criteria to see why it satisfies simple structure:

  • each row contains at least one zero (exactly two in each row)
  • each column contains at least three zeros (since there are three factors)
  • for every pair of factors, most items have zero on one factor and non-zeros on the other factor (e.g., looking at Factors 1 and 2, Items 1 through 6 satisfy this requirement)
  • for every pair of factors, all items have zero entries
  • for every pair of factors, none of the items have two non-zero entries

An easier criteria from Pedhazur and Schemlkin (1991) states that

  • each item has high loadings on one factor only
  • each factor has high loadings for only some of the items.

For the following factor matrix, explain why it does not conform to simple structure using both the conventional and Pedhazur test.

Item Factor 1 Factor 2 Factor 3
1 0.8 0 0.8
2 0.8 0 0.8
3 0.8 0 0
4 0.8 0 0
5 0 0.8 0.8
6 0 0.8 0.8
7 0 0.8 0.8
8 0 0.8 0

Solution: Using the conventional test, although Criteria 1 and 2 are satisfied (each row has at least one zero, each column has at least three zeroes), Criteria 3 fails because for Factors 2 and 3, only 3/8 rows have 0 on one factor and non-zero on the other. Additionally, for Factors 2 and 3, only Items 5 through 7 have non-zero loadings or 3/8 rows have non-zero coefficients (fails Criteria 4 and 5 simultaneously). Using the Pedhazur method, Items 1, 2, 5, 6, and 7 have high loadings on two factors (fails first criteria) and Factor 3 has high loadings on a majority or 5/8 items (fails second criteria).

Orthogonal Rotation (2 factor PAF)

We know that the goal of factor rotation is to rotate the factor matrix so that it can approach simple structure in order to improve interpretability. Orthogonal rotation assumes that the factors are not correlated. The benefit of doing an orthogonal rotation is that loadings are simple correlations of items with factors, and standardized solutions can estimate unique contribution of each factor. The most common type of orthogonal rotation is Varimax rotation. We will walk through how to do this in SPSS.

Running a two-factor solution (PAF) with Varimax rotation in SPSS

The steps to running a two-factor Principal Axis Factoring is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Varimax. Make sure under Display to check Rotated Solution and Loading plot(s), and under Maximum Iterations for Convergence enter 100.

fig13

Pasting the syntax into the SPSS editor you obtain:

Let’s first talk about what tables are the same or different from running a PAF with no rotation. First, we know that the unrotated factor matrix (Factor Matrix table) should be the same. Additionally, since the  common variance explained by both factors should be the same, the Communalities table should be the same. The main difference is that we ran a rotation, so we should get the rotated solution (Rotated Factor Matrix) as well as the transformation used to obtain the rotation (Factor Transformation Matrix). Finally, although the total variance explained by all factors stays the same, the total variance explained by  each  factor will be different.

Rotated Factor Matrix (2-factor PAF Varimax)

Rotated Factor Matrix
Factor
1 2
1 0.646 0.139
2 -0.188 -0.129
3 -0.490 -0.281
4 0.624 0.268
5 0.544 0.221
6 0.229 0.507
7 0.275 0.881
8 0.442 0.202
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization.
a. Rotation converged in 3 iterations.

The Rotated Factor Matrix table tells us what the factor loadings look like after rotation (in this case Varimax).  Kaiser normalization  is a method to obtain stability of solutions across samples. After rotation, the loadings are rescaled back to the proper size. This means that equal weight is given to all items when performing the rotation. The only drawback is if the communality is low for a particular item, Kaiser normalization will weight these items equally with items with high communality. As such, Kaiser normalization is preferred when communalities are high across all items. You can turn off Kaiser normalization by specifying

Here is what the Varimax rotated loadings look like without Kaiser normalization. Compared to the rotated factor matrix with Kaiser normalization the patterns look similar if you flip Factors 1 and 2; this may be an artifact of the rescaling. Another possible reasoning for the stark differences may be due to the low communalities for Item 2  (0.052) and Item 8 (0.236). Kaiser normalization weights these items equally with the other high communality items.

Rotated Factor Matrix
Factor
1 2
1 0.207 0.628
2 -0.148 -0.173
3 -0.331 -0.458
4 0.332 0.592
5 0.277 0.517
6 0.528 0.174
7 0.905 0.180
8 0.248 0.418
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax without Kaiser Normalization.
a. Rotation converged in 3 iterations.

Interpreting the factor loadings (2-factor PAF Varimax)

In the table above, the absolute loadings that are higher than 0.4 are highlighted in blue for Factor 1 and in red for Factor 2. We can see that Items 6 and 7 load highly onto Factor 1 and Items 1, 3, 4, 5, and 8 load highly onto Factor 2. Item 2 does not seem to load highly on any factor. Looking more closely at Item 6 “My friends are better at statistics than me” and Item 7 “Computers are useful only for playing games”, we don’t see a clear construct that defines the two. Item 2, “I don’t understand statistics” may be too general an item and isn’t captured by SPSS Anxiety. It’s debatable at this point whether to retain a two-factor or one-factor solution, at the very minimum we should see if Item 2 is a candidate for deletion.

Factor Transformation Matrix and Factor Loading Plot (2-factor PAF Varimax)

The Factor Transformation Matrix tells us how the Factor Matrix was rotated. In SPSS, you will see a matrix with two rows and two columns because we have two factors.

Factor Transformation Matrix
Factor 1 2
1 0.773 0.635
2 -0.635 0.773
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization.

How do we interpret this matrix? Well, we can see it as the way to move from the Factor Matrix to the Rotated Factor Matrix. From the Factor Matrix we know that the loading of Item 1 on Factor 1 is \(0.588\) and the loading of Item 1 on Factor 2 is \(-0.303\), which gives us the pair \((0.588,-0.303)\); but in the Rotated Factor Matrix the new pair is \((0.646,0.139)\). How do we obtain this new transformed pair of values? We can do what’s called matrix multiplication. The steps are essentially to start with one column of the Factor Transformation matrix, view it as another ordered pair and multiply matching ordered pairs. To get the first element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) in the first column of the Factor Transformation Matrix.

$$(0.588)(0.773)+(-0.303)(-0.635)=0.455+0.192=0.647.$$

To get the second element, we can multiply the ordered pair in the Factor Matrix \((0.588,-0.303)\) with the matching ordered pair \((0.773,-0.635)\) from the second column of the Factor Transformation Matrix:

$$(0.588)(0.635)+(-0.303)(0.773)=0.373-0.234=0.139.$$

Voila! We have obtained the new transformed pair with some rounding error. The figure below summarizes the steps we used to perform the transformation

fig18

The Factor Transformation Matrix can also tell us angle of rotation if we take the inverse cosine of the diagonal element. In this case, the angle of rotation is \(cos^{-1}(0.773) =39.4 ^{\circ}\). In the factor loading plot, you can see what that angle of rotation looks like, starting from \(0^{\circ}\) rotating up in a counterclockwise direction by \(39.4^{\circ}\). Notice here that the newly rotated x and y-axis are still at \(90^{\circ}\) angles from one another, hence the name orthogonal (a non-orthogonal or oblique rotation means that the new axis is no longer \(90^{\circ}\) apart. The points do not move in relation to the axis but rotate with it.

fig17b

Total Variance Explained (2-factor PAF Varimax)

The Total Variance Explained table contains the same columns as the PAF solution with no rotation, but adds another set of columns called “Rotation Sums of Squared Loadings”. This makes sense because if our rotated Factor Matrix is different, the square of the loadings should be different, and hence the Sum of Squared loadings will be different for each factor. However, if you sum the Sums of Squared Loadings across all factors for the Rotation solution,

$$ 1.701 + 1.309 = 3.01$$

and for the unrotated solution,

$$ 2.511 + 0.499 = 3.01,$$

you will see that the two sums are the same. This is because rotation does not change the total common variance. Looking at the Rotation Sums of Squared Loadings for Factor 1, it still has the largest total variance, but now that shared variance is split more evenly.

Total Variance Explained
Factor Rotation Sums of Squared Loadings
Total % of Variance Cumulative %
1 1.701 21.258 21.258
2 1.309 16.363 37.621
Extraction Method: Principal Axis Factoring.

Other Orthogonal Rotations

Varimax rotation is the most popular but one among other orthogonal rotations. The benefit of Varimax rotation is that it maximizes the variances of the loadings within the factors while maximizing differences between high and low loadings on a particular factor. Higher loadings are made higher while lower loadings are made lower. This makes Varimax rotation good for achieving simple structure but not as good for detecting an overall factor because it splits up variance of major factors among lesser ones. Quartimax may be a better choice for detecting an overall factor. It maximizes the squared loadings so that each item loads most strongly onto a single factor.

Here is the output of the Total Variance Explained table juxtaposed side-by-side for Varimax versus Quartimax rotation.

Total Variance Explained
Factor Quartimax Varimax
Total Total
1 2.381 1.701
2 0.629 1.309
Extraction Method: Principal Axis Factoring.

You will see that whereas Varimax distributes the variances evenly across both factors, Quartimax tries to consolidate more variance into the first factor.

Equamax is a hybrid of Varimax and Quartimax, but because of this may behave erratically and according to Pett et al. (2003), is not generally recommended.

Oblique Rotation

In oblique rotation, the factors are no longer orthogonal to each other (x and y axes are not \(90^{\circ}\) angles to each other). Like orthogonal rotation, the goal is rotation of the reference axes about the origin to achieve a simpler and more meaningful factor solution compared to the unrotated solution. In oblique rotation, you will see three unique tables in the SPSS output:

  • factor pattern matrix contains partial standardized regression coefficients of each item with a particular factor
  • factor structure matrix contains simple zero order correlations of each item with a particular factor
  • factor correlation matrix is a matrix of intercorrelations among factors

Suppose the Principal Investigator hypothesizes that the two factors are correlated, and wishes to test this assumption. Let’s proceed with one of the most common types of oblique rotations in SPSS, Direct Oblimin.

Running a two-factor solution (PAF) with Direct Quartimin rotation in SPSS

The steps to running a Direct Oblimin is the same as before (Analyze – Dimension Reduction – Factor – Extraction), except that under Rotation – Method we check Direct Oblimin. The other parameter we have to put in is delta , which defaults to zero. Technically, when delta = 0, this is known as Direct Quartimin. Larger positive values for delta increases the correlation among factors. However, in general you don’t want the correlations to be too high or else there is no reason to split your factors up. In fact, SPSS caps the delta value at 0.8 (the cap for negative values is -9999). Negative delta factors may lead to orthogonal factor solutions. For the purposes of this analysis, we will leave our delta = 0 and do a Direct Quartimin analysis.

fig14

All the questions below pertain to Direct Oblimin in SPSS.

  • When selecting Direct Oblimin, delta = 0 is actually Direct Quartimin.
  • Smaller delta values will increase the correlations among factors.
  • You typically want your delta values to be as high as possible.

Answers: 1. T, 2. F, larger delta values, 3. F, delta leads to higher factor correlations, in general you don’t want factors to be too highly correlated

Factor Pattern Matrix (2-factor PAF Direct Quartimin)

The factor pattern matrix represent partial standardized regression coefficients of each item with a particular factor. For example,  \(0.740\) is the effect of Factor 1 on Item 1 controlling for Factor 2 and \(-0.137\) is the effect of Factor 2 on Item 1 controlling for Factor 1. Just as in orthogonal rotation, the square of the loadings represent the contribution of the factor to the variance of the item, but excluding the overlap between correlated factors. Factor 1 uniquely contributes \((0.740)^2=0.405=40.5\%\) of the variance in Item 1 (controlling for Factor 2 ), and Factor 2 uniquely contributes \((-0.137)^2=0.019=1.9%\) of the variance in Item 1 (controlling for Factor 1).

Pattern Matrix
Factor
1 2
1 0.740 -0.137
2 -0.180 -0.067
3 -0.490 -0.108
4 0.660 0.029
5 0.580 0.011
6 0.077 0.504
7 -0.017 0.933
8 0.462 0.036
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.
a. Rotation converged in 5 iterations.

Factor Structure Matrix (2-factor PAF Direct Quartimin)

The factor structure matrix represent the simple zero-order correlations of the items with each factor (it’s as if you ran a simple regression of a single factor on the outcome). For example, \(0.653\) is the simple correlation of Factor 1 on Item 1 and \(0.333\) is the simple correlation of Factor 2 on Item 1. The more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. From this we can see that Items 1, 3, 4, 5, and 8 load highly onto Factor 1 and Items 6, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load well on either factor.

Additionally, we can look at the variance explained by each factor not controlling for the other factors. For example,  Factor 1 contributes \((0.653)^2=0.426=42.6\%\) of the variance in Item 1, and Factor 2 contributes \((0.333)^2=0.11=11.0%\) of the variance in Item 1. Notice that the contribution in variance of Factor 2 is higher \(11\%\) vs. \(1.9\%\) because in the Pattern Matrix we controlled for the effect of Factor 1, whereas in the Structure Matrix we did not.

Structure Matrix
Factor
1 2
1 0.653 0.333
2 -0.222 -0.181
3 -0.559 -0.420
4 0.678 0.449
5 0.587 0.380
6 0.398 0.553
7 0.577 0.923
8 0.485 0.330
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.

Factor Correlation Matrix (2-factor PAF Direct Quartimin)

Recall that the more correlated the factors, the more difference between pattern and structure matrix and the more difficult to interpret the factor loadings. In our case, Factor 1 and Factor 2 are pretty highly correlated, which is why there is such a big difference between the factor pattern and factor structure matrices.

Factor Correlation Matrix
Factor 1 2
1 1.000 0.636
2 0.636 1.000
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization.

Factor plot

The difference between an orthogonal versus oblique rotation is that the factors in an oblique rotation are correlated. This means not only must we account for the angle of axis rotation \(\theta\), we have to account for the angle of correlation \(\phi\). The angle of axis rotation is defined as the angle between the rotated and unrotated axes (blue and black axes). From the Factor Correlation Matrix, we know that the correlation is \(0.636\), so the angle of correlation is \(cos^{-1}(0.636) = 50.5^{\circ}\), which is the angle between the two rotated axes (blue x and blue y-axis). The sum of rotations \(\theta\) and \(\phi\) is the total angle rotation. We are not given the angle of axis rotation, so we only know that the total angle rotation is \(\theta + \phi = \theta + 50.5^{\circ}\).

fig19c

Relationship between the Pattern and Structure Matrix

The structure matrix is in fact a derivative of the pattern matrix. If you multiply the pattern matrix by the factor correlation matrix, you will get back the factor structure matrix. Let’s take the example of the ordered pair \((0.740,-0.137)\) from the Pattern Matrix, which represents the partial correlation of Item 1 with Factors 1 and 2 respectively. Performing matrix multiplication for the first column of the Factor Correlation Matrix we get

$$ (0.740)(1) + (-0.137)(0.636) = 0.740 – 0.087 =0.652.$$

Similarly, we multiple the ordered factor pair with the second column of the Factor Correlation Matrix to get:

$$ (0.740)(0.636) + (-0.137)(1) = 0.471 -0.137 =0.333 $$

Looking at the first row of the Structure Matrix we get \((0.653,0.333)\) which matches our calculation! This neat fact can be depicted with the following figure:

fig21

As a quick aside, suppose that the factors are orthogonal, which means that the factor correlations are 1′ s on the diagonal and zeros on the off-diagonal, a quick calculation with the ordered pair \((0.740,-0.137)\)

$$ (0.740)(1) + (-0.137)(0) = 0.740$$

and similarly,

$$ (0.740)(0) + (-0.137)(1) = -0.137$$

and you get back the same ordered pair. This is called multiplying by the identity matrix (think of it as multiplying \(2*1 = 2\)).

  • Without changing your data or model, how would you make the factor pattern matrices and factor structure matrices more aligned with each other?
  • True or False, When you decrease delta, the pattern and structure matrix will become closer to each other.

Answers: 1. Decrease the delta values so that the correlation between factors approaches zero. 2. T, the correlations will become more orthogonal and hence the pattern and structure matrix will be closer.

Total Variance Explained (2-factor PAF Direct Quartimin)

The column Extraction Sums of Squared Loadings is the same as the unrotated solution, but we have an additional column known as Rotation Sums of Squared Loadings. SPSS says itself that “when factors are correlated, sums of squared loadings cannot be added to obtain total variance”. You will note that compared to the Extraction Sums of Squared Loadings, the Rotation Sums of Squared Loadings is only slightly lower for Factor 1 but much higher for Factor 2. This is because unlike orthogonal rotation, this is no longer the unique contribution of Factor 1 and Factor 2. How do we obtain the Rotation Sums of Squared Loadings? SPSS squares the Structure Matrix and sums down the items.

Total Variance Explained
Factor Extraction Sums of Squared Loadings Rotation Sums of Squared Loadings
Total % of Variance Cumulative % Total
1 2.511 31.382 31.382 2.318
2 0.499 6.238 37.621 1.931
Extraction Method: Principal Axis Factoring.
a. When factors are correlated, sums of squared loadings cannot be added to obtain a total variance.

As a demonstration, let’s obtain the loadings from the Structure Matrix for Factor 1

$$ (0.653)^2 + (-0.222)^2 + (-0.559)^2 + (0.678)^2 + (0.587)^2 + (0.398)^2 + (0.577)^2 + (0.485)^2 = 2.318.$$

Note that \(2.318\) matches the Rotation Sums of Squared Loadings for the first factor. This means that the Rotation Sums of Squared Loadings represent the non- unique contribution of each factor to total common variance, and summing these squared loadings for all factors can lead to estimates that are greater than total variance.

Interpreting the factor loadings (2-factor PAF Direct Quartimin)

Finally, let’s conclude by interpreting the factors loadings more carefully. Let’s compare the Pattern Matrix and Structure Matrix tables side-by-side. First we highlight absolute loadings that are higher than 0.4 in blue for Factor 1 and in red for Factor 2. We see that the absolute loadings in the Pattern Matrix are in general higher in Factor 1 compared to the Structure Matrix and lower for Factor 2. This makes sense because the Pattern Matrix partials out the effect of the other factor. Looking at the Pattern Matrix, Items 1, 3, 4, 5, and 8 load highly on Factor 1, and Items 6 and 7 load highly on Factor 2. Looking at the Structure Matrix, Items 1, 3, 4, 5, 7 and 8 are highly loaded onto Factor 1 and Items 3, 4, and 7 load highly onto Factor 2. Item 2 doesn’t seem to load on any factor. The results of the two matrices are somewhat inconsistent but can be explained by the fact that in the Structure Matrix Items 3, 4 and 7 seem to load onto both factors evenly but not in the Pattern Matrix. For this particular analysis, it seems to make more sense to interpret the Pattern Matrix because it’s clear that Factor 1 contributes uniquely to most items in the SAQ-8 and Factor 2 contributes common variance only to two items (Items 6 and 7). There is an argument here that perhaps Item 2 can be eliminated from our survey and to consolidate the factors into one SPSS Anxiety factor. We talk to the Principal Investigator and we think it’s feasible to accept SPSS Anxiety as the single factor explaining the common variance in all the items, but we choose to remove Item 2, so that the SAQ-8 is now the SAQ-7.

Pattern Matrix Structure Matrix
Factor Factor
1 2 1 2
1 0.740 -0.137 0.653 0.333
2 -0.180 -0.067 -0.222 -0.181
3 -0.490 -0.108 -0.559 -0.420
4 0.660 0.029 0.678 0.449
5 0.580 0.011 0.587 0.380
6 0.077 0.504 0.398 0.553
7 -0.017 0.933 0.577 0.923
8 0.462 0.036 0.485 0.330
  • In oblique rotation, an element of a factor pattern matrix is the unique contribution of the factor to the item whereas an element in the factor structure matrix is the non- unique contribution to the factor to an item.
  • In the Total Variance Explained table, the Rotation Sum of Squared Loadings represent the unique contribution of each factor to total common variance.
  • The Pattern Matrix can be obtained by multiplying the Structure Matrix with the Factor Correlation Matrix
  • If the factors are orthogonal, then the Pattern Matrix equals the Structure Matrix
  • In oblique rotations, the sum of squared loadings for each item across all factors is equal to the communality (in the SPSS Communalities table) for that item.

Answers: 1. T, 2. F, represent the non -unique contribution (which means the total sum of squares can be greater than the total communality), 3. F, the Structure Matrix is obtained by multiplying the Pattern Matrix with the Factor Correlation Matrix, 4. T, it’s like multiplying a number by 1, you get the same number back, 5. F, this is true only for orthogonal rotations, the SPSS Communalities table in rotated factor solutions is based off of the unrotated solution, not the rotated solution.

As a special note, did we really achieve simple structure? Although rotation helps us achieve simple structure, if the interrelationships do not hold itself up to simple structure, we can only modify our model. In this case we chose to remove Item 2 from our model.

Promax Rotation

Promax rotation begins with Varimax (orthgonal) rotation, and uses Kappa to raise the power of the loadings. Promax really reduces the small loadings. Promax also runs faster than Varimax, and in our example Promax took 3 iterations while Direct Quartimin (Direct Oblimin with Delta =0) took 5 iterations.

  • Varimax, Quartimax and Equamax are three types of orthogonal rotation and Direct Oblimin, Direct Quartimin and Promax are three types of oblique rotations.

Answers: 1. T.

Generating Factor Scores

Suppose the Principal Investigator is happy with the final factor analysis which was the two-factor Direct Quartimin solution. She has a hypothesis that SPSS Anxiety and Attribution Bias predict student scores on an introductory statistics course, so would like to use the factor scores as a predictor in this new regression analysis. Since a factor is by nature unobserved, we need to first predict or generate plausible factor scores. In SPSS, there are three methods to factor score generation, Regression, Bartlett, and Anderson-Rubin.

Generating factor scores using the Regression Method in SPSS

In order to generate factor scores, run the same factor analysis model but click on Factor Scores (Analyze – Dimension Reduction – Factor – Factor Scores). Then check Save as variables, pick the Method and optionally check Display factor score coefficient matrix.

fig25

The code pasted in the SPSS Syntax Editor looksl like this:

Here we picked the Regression approach after fitting our two-factor Direct Quartimin solution. After generating the factor scores, SPSS will add two extra variables to the end of your variable list, which you can view via Data View. The figure below shows what this looks like for the first 5 participants, which SPSS calls FAC1_1 and FAC2_1 for the first and second factors. These are now ready to be entered in another analysis as predictors.

fig26

For those who want to understand how the scores are generated, we can refer to the Factor Score Coefficient Matrix. These are essentially the regression weights that SPSS uses to generate the scores. We know that the ordered pair of scores for the first participant is \(-0.880, -0.113\). We also know that the 8 scores for the first participant are \(2, 1, 4, 2, 2, 2, 3, 1\). However, what SPSS uses is actually the standardized scores, which can be easily obtained in SPSS by using Analyze – Descriptive Statistics – Descriptives – Save standardized values as variables. The standardized scores obtained are:   \(-0.452, -0.733, 1.32, -0.829, -0.749, -0.2025, 0.069, -1.42\). Using the Factor Score Coefficient matrix, we multiply the participant scores by the coefficient matrix for each column. For the first factor:

$$ \begin{eqnarray} &(0.284) (-0.452) + (-0.048)-0.733) + (-0.171)(1.32) + (0.274)(-0.829) \\ &+ (0.197)(-0.749) +(0.048)(-0.2025) + (0.174) (0.069) + (0.133)(-1.42) \\ &= -0.880, \end{eqnarray} $$

which matches FAC1_1  for the first participant. You can continue this same procedure for the second factor to obtain FAC2_1.

Factor Score Coefficient Matrix
Item Factor
1 2
1 0.284 0.005
2 -0.048 -0.019
3 -0.171 -0.045
4 0.274 0.045
5 0.197 0.036
6 0.048 0.095
7 0.174 0.814
8 0.133 0.028
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. Factor Scores Method: Regression.

The second table is the Factor Score Covariance Matrix,

Factor Score Covariance Matrix
Factor 1 2
1 1.897 1.895
2 1.895 1.990
Extraction Method: Principal Axis Factoring. Rotation Method: Oblimin with Kaiser Normalization. Factor Scores Method: Regression.

This table can be interpreted as the covariance matrix of the factor scores, however it would only be equal to the raw covariance if the factors are orthogonal. For example, if we obtained the raw covariance matrix of the factor scores we would get

Correlations
FAC1_1 FAC1_2
FAC1_1 Covariance 0.777 0.604
FAC1_2 Covariance 0.604 0.870

You will notice that these values are much lower. Let’s compare the same two tables but for Varimax rotation:

Factor Score Covariance Matrix
Factor 1 2
1 0.670 0.131
2 0.131 0.805
Extraction Method: Principal Axis Factoring. Rotation Method: Varimax with Kaiser Normalization. Factor Scores Method: Regression.

If you compare these elements to the Covariance table below, you will notice they are the same.

Correlations
FAC1_1 FAC1_2
FAC1_1 Covariance 0.670 0.131
FAC1_2 Covariance 0.131 0.805

Note with the Bartlett and Anderson-Rubin methods you will not obtain the Factor Score Covariance matrix.

Regression, Bartlett and Anderson-Rubin compared

Among the three methods, each has its pluses and minuses. The regression method maximizes the correlation (and hence validity) between the factor scores and the underlying factor but the scores can be somewhat biased. This means even if you have an orthogonal solution, you can still have correlated factor scores. For Bartlett’s method, the factor scores highly correlate with its own factor and not with others, and they are an unbiased estimate of the true factor score. Unbiased scores means that with repeated sampling of the factor scores, the average of the scores is equal to the average of the true factor score. The Anderson-Rubin method perfectly scales the factor scores so that the factor scores are uncorrelated with other factors and uncorrelated with other factor scores . Since Anderson-Rubin scores impose a correlation of zero between factor scores, it is not the best option to choose for oblique rotations. Additionally, Anderson-Rubin scores are biased.

In summary, if you do an orthogonal rotation, you can pick any of the the three methods. For orthogonal rotations, use Bartlett if you want unbiased scores, use the regression method if you want to maximize validity and use Anderson-Rubin if you want the factor scores themselves to be uncorrelated with other factor scores. If you do oblique rotations, it’s preferable to stick with the Regression method. Do not use Anderson-Rubin for oblique rotations.

  • If you want the highest correlation of the factor score with the corresponding factor (i.e., highest validity), choose the regression method.
  • Bartlett scores are unbiased whereas Regression and Anderson-Rubin scores are biased.
  • Anderson-Rubin is appropriate for orthogonal but not for oblique rotation because factor scores will be uncorrelated with other factor scores.

Answers: 1. T, 2. T, 3. T

Your Name (required)

Your Email (must be a valid email for us to receive the report!)

Comment/Error Report (required)

How to cite this page

  • © 2021 UC REGENTS
  • Privacy Policy

Research Method

Home » Factor Analysis – Steps, Methods and Examples

Factor Analysis – Steps, Methods and Examples

Table of Contents

Factor Analysis

Factor Analysis

Definition:

Factor analysis is a statistical technique that is used to identify the underlying structure of a relatively large set of variables and to explain these variables in terms of a smaller number of common underlying factors. It helps to investigate the latent relationships between observed variables.

Factor Analysis Steps

Here are the general steps involved in conducting a factor analysis:

1. Define the Research Objective:

Clearly specify the purpose of the factor analysis. Determine what you aim to achieve or understand through the analysis.

2. Data Collection:

Gather the data on the variables of interest. These variables should be measurable and related to the research objective. Ensure that you have a sufficient sample size for reliable results.

3. Assess Data Suitability:

Examine the suitability of the data for factor analysis. Check for the following aspects:

  • Sample size: Ensure that you have an adequate sample size to perform factor analysis reliably.
  • Missing values: Handle missing data appropriately, either by imputation or exclusion.
  • Variable characteristics: Verify that the variables are continuous or at least ordinal in nature. Categorical variables may require different analysis techniques.
  • Linearity: Assess whether the relationships among variables are linear.

4. Determine the Factor Analysis Technique:

There are different types of factor analysis techniques available, such as exploratory factor analysis (EFA) and confirmatory factor analysis (CFA). Choose the appropriate technique based on your research objective and the nature of the data.

5. Perform Factor Analysis:

   a. Exploratory Factor Analysis (EFA):

  • Extract factors: Use factor extraction methods (e.g., principal component analysis or common factor analysis) to identify the initial set of factors.
  • Determine the number of factors: Decide on the number of factors to retain based on statistical criteria (e.g., eigenvalues, scree plot) and theoretical considerations.
  • Rotate factors: Apply factor rotation techniques (e.g., varimax, oblique) to simplify the factor structure and make it more interpretable.
  • Interpret factors: Analyze the factor loadings (correlations between variables and factors) to interpret the meaning of each factor.
  • Determine factor reliability: Assess the internal consistency or reliability of the factors using measures like Cronbach’s alpha.
  • Report results: Document the factor loadings, rotated component matrix, communalities, and any other relevant information.

   b. Confirmatory Factor Analysis (CFA):

  • Formulate a theoretical model: Specify the hypothesized relationships among variables and factors based on prior knowledge or theoretical considerations.
  • Define measurement model: Establish how each variable is related to the underlying factors by assigning factor loadings in the model.
  • Test the model: Use statistical techniques like maximum likelihood estimation or structural equation modeling to assess the goodness-of-fit between the observed data and the hypothesized model.
  • Modify the model: If the initial model does not fit the data adequately, revise the model by adding or removing paths, allowing for correlated errors, or other modifications to improve model fit.
  • Report results: Present the final measurement model, parameter estimates, fit indices (e.g., chi-square, RMSEA, CFI), and any modifications made.

6. Interpret and Validate the Factors:

Once you have identified the factors, interpret them based on the factor loadings, theoretical understanding, and research objectives. Validate the factors by examining their relationships with external criteria or by conducting further analyses if necessary.

Types of Factor Analysis

Types of Factor Analysis are as follows:

Exploratory Factor Analysis (EFA)

EFA is used to explore the underlying structure of a set of observed variables without any preconceived assumptions about the number or nature of the factors. It aims to discover the number of factors and how the observed variables are related to those factors. EFA does not impose any restrictions on the factor structure and allows for cross-loadings of variables on multiple factors.

Confirmatory Factor Analysis (CFA)

CFA is used to test a pre-specified factor structure based on theoretical or conceptual assumptions. It aims to confirm whether the observed variables measure the latent factors as intended. CFA tests the fit of a hypothesized model and assesses how well the observed variables are associated with the expected factors. It is often used for validating measurement instruments or evaluating theoretical models.

Principal Component Analysis (PCA)

PCA is a dimensionality reduction technique that can be considered a form of factor analysis, although it has some differences. PCA aims to explain the maximum amount of variance in the observed variables using a smaller number of uncorrelated components. Unlike traditional factor analysis, PCA does not assume that the observed variables are caused by underlying factors but focuses solely on accounting for variance.

Common Factor Analysis

It assumes that the observed variables are influenced by common factors and unique factors (specific to each variable). It attempts to estimate the common factor structure by extracting the shared variance among the variables while also considering the unique variance of each variable.

Hierarchical Factor Analysis

Hierarchical factor analysis involves multiple levels of factors. It explores both higher-order and lower-order factors, aiming to capture the complex relationships among variables. Higher-order factors are based on the relationships among lower-order factors, which are in turn based on the relationships among observed variables.

Factor Analysis Formulas

Factor Analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors.

Here are some of the essential formulas and calculations used in factor analysis:

Correlation Matrix :

The first step in factor analysis is to create a correlation matrix, which calculates the correlation coefficients between pairs of variables.

Correlation coefficient (Pearson’s r) between variables X and Y is calculated as:

r(X,Y) = Σ[(xi – x̄)(yi – ȳ)] / [n-1] σx σy

where: xi, yi are the data points, x̄, ȳ are the means of X and Y respectively, σx, σy are the standard deviations of X and Y respectively, n is the number of data points.

Extraction of Factors :

The extraction of factors from the correlation matrix is typically done by methods such as Principal Component Analysis (PCA) or other similar methods.

The formula used in PCA to calculate the principal components (factors) involves finding the eigenvalues and eigenvectors of the correlation matrix.

Let’s denote the correlation matrix as R. If λ is an eigenvalue of R, and v is the corresponding eigenvector, they satisfy the equation: Rv = λv

Factor Loadings :

Factor loadings are the correlations between the original variables and the factors. They can be calculated as the eigenvectors normalized by the square roots of their corresponding eigenvalues.

Communality and Specific Variance :

Communality of a variable is the proportion of variance in that variable explained by the factors. It can be calculated as the sum of squared factor loadings for that variable across all factors.

The specific variance of a variable is the proportion of variance in that variable not explained by the factors, and it’s calculated as 1 – Communality.

Factor Rotation : Factor rotation, such as Varimax or Promax, is used to make the output more interpretable. It doesn’t change the underlying relationships but affects the loadings of the variables on the factors.

For example, in the Varimax rotation, the objective is to minimize the variance of the squared loadings of a factor (column) on all the variables (rows) in a factor matrix, which leads to more high and low loadings, making the factor easier to interpret.

Examples of Factor Analysis

Here are some real-time examples of factor analysis:

  • Psychological Research: In a study examining personality traits, researchers may use factor analysis to identify the underlying dimensions of personality by analyzing responses to various questionnaires or surveys. Factors such as extroversion, neuroticism, and conscientiousness can be derived from the analysis.
  • Market Research: In marketing, factor analysis can be used to understand consumers’ preferences and behaviors. For instance, by analyzing survey data related to product features, pricing, and brand perception, researchers can identify factors such as price sensitivity, brand loyalty, and product quality that influence consumer decision-making.
  • Finance and Economics: Factor analysis is widely used in portfolio management and asset pricing models. By analyzing historical market data, factors such as market returns, interest rates, inflation rates, and other economic indicators can be identified. These factors help in understanding and predicting investment returns and risk.
  • Social Sciences: Factor analysis is employed in social sciences to explore underlying constructs in complex datasets. For example, in education research, factor analysis can be used to identify dimensions such as academic achievement, socio-economic status, and parental involvement that contribute to student success.
  • Health Sciences: In medical research, factor analysis can be utilized to identify underlying factors related to health conditions, symptom clusters, or treatment outcomes. For instance, in a study on mental health, factor analysis can be used to identify underlying factors contributing to depression, anxiety, and stress.
  • Customer Satisfaction Surveys: Factor analysis can help businesses understand the key drivers of customer satisfaction. By analyzing survey responses related to various aspects of product or service experience, factors such as product quality, customer service, and pricing can be identified, enabling businesses to focus on areas that impact customer satisfaction the most.

Factor analysis in Research Example

Here’s an example of how factor analysis might be used in research:

Let’s say a psychologist is interested in the factors that contribute to overall wellbeing. They conduct a survey with 1000 participants, asking them to respond to 50 different questions relating to various aspects of their lives, including social relationships, physical health, mental health, job satisfaction, financial security, personal growth, and leisure activities.

Given the broad scope of these questions, the psychologist decides to use factor analysis to identify underlying factors that could explain the correlations among responses.

After conducting the factor analysis, the psychologist finds that the responses can be grouped into five factors:

  • Physical Wellbeing : Includes variables related to physical health, exercise, and diet.
  • Mental Wellbeing : Includes variables related to mental health, stress levels, and emotional balance.
  • Social Wellbeing : Includes variables related to social relationships, community involvement, and support from friends and family.
  • Professional Wellbeing : Includes variables related to job satisfaction, work-life balance, and career development.
  • Financial Wellbeing : Includes variables related to financial security, savings, and income.

By reducing the 50 individual questions to five underlying factors, the psychologist can more effectively analyze the data and draw conclusions about the major aspects of life that contribute to overall wellbeing.

In this way, factor analysis helps researchers understand complex relationships among many variables by grouping them into a smaller number of factors, simplifying the data analysis process, and facilitating the identification of patterns or structures within the data.

When to Use Factor Analysis

Here are some circumstances in which you might want to use factor analysis:

  • Data Reduction : If you have a large set of variables, you can use factor analysis to reduce them to a smaller set of factors. This helps in simplifying the data and making it easier to analyze.
  • Identification of Underlying Structures : Factor analysis can be used to identify underlying structures in a dataset that are not immediately apparent. This can help you understand complex relationships between variables.
  • Validation of Constructs : Factor analysis can be used to confirm whether a scale or measure truly reflects the construct it’s meant to measure. If all the items in a scale load highly on a single factor, that supports the construct validity of the scale.
  • Generating Hypotheses : By revealing the underlying structure of your variables, factor analysis can help to generate hypotheses for future research.
  • Survey Analysis : If you have a survey with many questions, factor analysis can help determine if there are underlying factors that explain response patterns.

Applications of Factor Analysis

Factor Analysis has a wide range of applications across various fields. Here are some of them:

  • Psychology : It’s often used in psychology to identify the underlying factors that explain different patterns of correlations among mental abilities. For instance, factor analysis has been used to identify personality traits (like the Big Five personality traits), intelligence structures (like Spearman’s g), or to validate the constructs of different psychological tests.
  • Market Research : In this field, factor analysis is used to identify the factors that influence purchasing behavior. By understanding these factors, businesses can tailor their products and marketing strategies to meet the needs of different customer groups.
  • Healthcare : In healthcare, factor analysis is used in a similar way to psychology, identifying underlying factors that might influence health outcomes. For instance, it could be used to identify lifestyle or behavioral factors that influence the risk of developing certain diseases.
  • Sociology : Sociologists use factor analysis to understand the structure of attitudes, beliefs, and behaviors in populations. For example, factor analysis might be used to understand the factors that contribute to social inequality.
  • Finance and Economics : In finance, factor analysis is used to identify the factors that drive financial markets or economic behavior. For instance, factor analysis can help understand the factors that influence stock prices or economic growth.
  • Education : In education, factor analysis is used to identify the factors that influence academic performance or attitudes towards learning. This could help in developing more effective teaching strategies.
  • Survey Analysis : Factor analysis is often used in survey research to reduce the number of items or to identify the underlying structure of the data.
  • Environment : In environmental studies, factor analysis can be used to identify the major sources of environmental pollution by analyzing the data on pollutants.

Advantages of Factor Analysis

Advantages of Factor Analysis are as follows:

  • Data Reduction : Factor analysis can simplify a large dataset by reducing the number of variables. This helps make the data easier to manage and analyze.
  • Structure Identification : It can identify underlying structures or patterns in a dataset that are not immediately apparent. This can provide insights into complex relationships between variables.
  • Construct Validation : Factor analysis can be used to validate whether a scale or measure accurately reflects the construct it’s intended to measure. This is important for ensuring the reliability and validity of measurement tools.
  • Hypothesis Generation : By revealing the underlying structure of your variables, factor analysis can help generate hypotheses for future research.
  • Versatility : Factor analysis can be used in various fields, including psychology, market research, healthcare, sociology, finance, education, and environmental studies.

Disadvantages of Factor Analysis

Disadvantages of Factor Analysis are as follows:

  • Subjectivity : The interpretation of the factors can sometimes be subjective, depending on how the data is perceived. Different researchers might interpret the factors differently, which can lead to different conclusions.
  • Assumptions : Factor analysis assumes that there’s some underlying structure in the dataset and that all variables are related. If these assumptions do not hold, factor analysis might not be the best tool for your analysis.
  • Large Sample Size Required : Factor analysis generally requires a large sample size to produce reliable results. This can be a limitation in studies where data collection is challenging or expensive.
  • Correlation, not Causation : Factor analysis identifies correlational relationships, not causal ones. It cannot prove that changes in one variable cause changes in another.
  • Complexity : The statistical concepts behind factor analysis can be difficult to understand and require expertise to implement correctly. Misuse or misunderstanding of the method can lead to incorrect conclusions.

About the author

' src=

Muhammad Hassan

Researcher, Academic Writer, Web developer

You may also like

Critical Analysis

Critical Analysis – Types, Examples and Writing...

MANOVA

MANOVA (Multivariate Analysis of Variance) –...

Uniform Histogram

Uniform Histogram – Purpose, Examples and Guide

Correlation Analysis

Correlation Analysis – Types, Methods and...

Documentary Analysis

Documentary Analysis – Methods, Applications and...

ANOVA

ANOVA (Analysis of variance) – Formulas, Types...

PowerShow.com - The best place to view and share online presentations

  • Preferences

Free template

PM source apportionment using Factor AnalysisMultiple Regression FAMR model: a case study in Bangkok - PowerPoint PPT Presentation

factor analysis case study ppt

PM source apportionment using Factor AnalysisMultiple Regression FAMR model: a case study in Bangkok

Pm source apportionment using factor analysis-multiple regression (fa-mr) model: ... antimony. manganese. chromium, cadmium. source: as cited in sharma (1994) ... – powerpoint ppt presentation.

  • Introduction to source impact assessment methods
  • of air pollution
  • 2. Fundamental of receptor model
  • 3. Factor Analysis-Multiple Regression Model (FA-MR)
  • Application of FA-MR for PM source apportionment
  • in Bangkok urban area in 1996
  • The dispersion models can deal explicitly with emissions from
  • single, identifiable sources within the same source category.
  • However, the dispersions model must build on several potentially
  • uncertain inputs, such as
  • emission data
  • meteorological data
  • the transport-diffusion-transformation-deposition mechanisms
  • Since CMB method use the statistical properties of the chemical composition data, we can not know more than the statistical inference.
  • If the sufficient number of marker elements was not analyzed, and/or sampling error and analyzing error are large, the estimated contribution for each emission may be questionable.
  • If we have more than two emission sources, and they discharge the particulate matters, which have similar chemical composition, we can not separate the contributions for there sources.
  • The fundamental relation between the concentration at a receptor site and source information can be expressed as
  • where Cik is the concentration of element i in sample k and aij is the composition of element i of source j the fractional abundance of element i in the j-th source profile, Sjk, is the contributing mass concentration (contribution) by the j-th source to sample k.
  • The basic model for a factor analysis is of the form
  • Calculate correlation matrix
  • Calculate Eigenvalue
  • Calculate Factor Loading
  • Calculate Absolute Factor Score
  • Regression analysis for estimating the mass
  • concentration by using AFS as the predictor variable
  • Estimate the new factor loading related to the
  • source composition

PowerShow.com is a leading presentation sharing website. It has millions of presentations already uploaded and available with 1,000s more being uploaded by its users every day. Whatever your area of interest, here you’ll be able to find and view presentations you’ll love and possibly download. And, best of all, it is completely free and easy to use.

You might even have a presentation you’d like to share with others. If so, just upload it to PowerShow.com. We’ll convert it to an HTML5 slideshow that includes all the media types you’ve already added: audio, video, music, pictures, animations and transition effects. Then you can share it with your target audience as well as PowerShow.com’s millions of monthly visitors. And, again, it’s all free.

About the Developers

PowerShow.com is brought to you by  CrystalGraphics , the award-winning developer and market-leading publisher of rich-media enhancement products for presentations. Our product offerings include millions of PowerPoint templates, diagrams, animated 3D characters and more.

World's Best PowerPoint Templates PowerPoint PPT Presentation

Theme Junkie

20+ Best Case Study PowerPoint Templates for In-Depth Analysis

In today’s dynamic professional environment, presenting in-depth case studies becomes critical for businesses and individuals alike. This compilation features over 20 of the best PowerPoint templates specifically designed for presenting comprehensive and engaging case studies. Whether you aim to detail the analysis and strategies behind a business decision, or delve into a person’s journey, these assets can streamline your process and elevate your presentation.

Each template is carefully curated, equipped with slides that cater to an array of elements necessary for a persuasive case study – presenting research, displaying data, sharing interviews, and more. With these templates at your disposal, you can easily translate raw information into an insightful and visually appealing narrative.

Available in both free and paid options, these PowerPoint templates encompass a diverse set of designs and formats. Be it a start-up pitch or an academic research presentation, this post brings forward a wide variety of quality tools for crafting impactful case studies.

One Subscription: Everything You Need for Your PowerPoint Presentation

Get everything you need to give the perfect presentation. From just $16, get unlimited access to thousands of PowerPoint presentation templates, graphics, fonts, and photos.

Build Your PowerPoint Presentation

Clean Business PPT

Clean Business PPT

Agency Portfolio PPT

Agency Portfolio PPT

Analysiz Powerpoint

Analysiz Powerpoint

Ciri PPT Template

Ciri PPT Template

Bolo PPT Template

Bolo PPT Template

The X Note Template

The X Note Template

Blue case study powerpoint template.

Blue Case Study PowerPoint Template

The Blue Case Study PowerPoint Template offers a sleek and modern design, perfect for various presentations. Designed meticulously, this 18-slide multipurpose template allows users to easily edit graphics and texts. It’s user-friendly, simply drag and drop pictures into placeholders. The template, ideal for seminars, webinars, business presentations, arrives with a documentation file and free support. Recommended free web fonts included. Note, preview images aren’t included within download files.

Black & Yellow Study PowerPoint Template

Black & Yellow Study PowerPoint Template

The Black & Yellow Study PowerPoint Template is a sleek, easy-to-use resource perfect for presenting information in an engaging way. With 35 customizable slides, charts and graphs for data representation, and drop-and-drag image placeholders, it balances professional design with practical features. Ideal for students, workers, or any professional who needs to visually communicate information, this template enhances any presentation. Note: preview photos are not included.

Vibrant Case Study PPT Template

Vibrant Case Study PPT Template

Introducing the Vibrant Case Study PPT Template: a modern, versatile tool perfect for enhancing business presentations, project pitches and lookbook slides. Features include a 16:9 widescreen format, resizable and editable graphics, and a convenient drag & drop picture placeholder. The package comes with PowerPoint and XML files, as well as a helpful guidance file. Please note, images are not included.

Stylish Case Study PowerPoint Template

Stylish Case Study PowerPoint Template

The Stylish Case Study PowerPoint Template by Decentrace is a clean, contemporary, and professional-grade deck design perfect for various business endeavours. Whether it’s a case study proposal, a sales report, or a startup pitch, this template, boasting of 25 total slides, resizable graphics, and free fonts, is an excellent tool. It comes with a handy help file and allows for easy image placement. However, images shown are just previews and not included in the file.

Case Study PowerPoint Template

Case Study PowerPoint Template

The Case Study PowerPoint Template by RRGraph Design is an all-inclusive tool for enhancing your business presentations. With 30 unique slides, 90+ customizable XML files, and options for light and dark backgrounds, it transforms every stage of your business development into engaging visual stories. Handmade infographics give an authentic touch to your brand’s narrative. Please note, image stocks are not included.

Case Study Presentation Template

Case Study Presentation Template

The Case Study Presentation Template is a professional PowerPoint template designed to enhance the quality of your next presentation. It comes with a helpful ‘Read Me’ text file and includes 30 easily customizable slides in seven different color themes. Despite the absence of images, its organization into named groups and ability to change size, recolor, and more make it a highly versatile asset.

Buminas Case Study PowerPoint Template

Buminas Case Study PowerPoint Template

The Buminas Case Study PowerPoint Template is a clear, versatile tool that can be used for a wide range of business presentations including finance, marketing, management, and many more. Its features include 30 unique, easily editable slides, free web fonts, and widescreen ratio. Keep in mind, demo images are for preview purposes only and are not included in the files.

Fun Case Study Presentation Template

Fun Case Study Presentation Template

The Fun Case Study Presentation Template is a unique yet professional choice for those needing a clean, creative and straightforward template. It features more than 20 unique slides, theme color options, resizable graphics and drag and drop photo replacement. The full HD 16:9 ratio and the minimal design make your presentation visually appealing. Easy to customize in Microsoft PowerPoint to match your personal or company brand.

Purple Case Study PowerPoint Template

Purple Case Study PowerPoint Template

The Purple Case Study PowerPoint Template offers a professional style that is easy to fully customize according to your preferences. Offered in both a dark version and a light version, this template is editable in PowerPoint format files, allowing you to alter images, colours, and text. It also features unique font themes, a color scheme, image placeholders, and free font use. Please note, preview and image stocks are not included.

Case Study Finance PowerPoint Template

Case Study Finance PowerPoint Template

The Case Study Finance PowerPoint Template offers a sleek and professional look for various presentations. It’s great for financial reports, business meetings, project pitches, and other uses. With 30 unique slides, a light background, and all graphics being resizable and editable, this versatile tool makes it easy to customize your presentation. The package also includes XML files, an icon pack, and a help file. Note: Image stocks are not included.

Study Case PowerPoint Template

Study Case PowerPoint Template

The Study Case PowerPoint Template is a flexible and creative asset perfect for both corporate and personal presentations. Boasting a clean, elegant design with 60 total slides – split evenly between light and dark versions – all in a widescreen 16:9 ratio. This user-friendly template, including master slide layouts and a free font, can enhance your presentations, potentially attracting more customers. Note: Images used in preview not included.

Case Study PowerPoint Presentation

Case Study PowerPoint Presentation

The Case Study PowerPoint Presentation is a versatile and interactive creative template that is easily customizable. Crafted for a wide range of uses, from academic presentations to innovative team projects, you can personalize elements like text, images, and colors. Offering over 125 slides, 5 predefined color variations, animations, infographic icons, and an easy drag-and-drop picture replacement, it’s compatible with all versions of PowerPoint. Please note, original template images are not included.

Cestudy Case Study PowerPoint Template

Cestudy Case Study PowerPoint Template

The CeStudy Case Study PowerPoint Template is a resourceful tool designed to amplify your company’s presentations. It comes with 26 distinctive slides, features such as resizable and editable graphics, easy-to-edit colors, and a wide screen ratio. Supported by free, prompt customer service, this template also provides provisions for drag and drop images, enhancing the beauty and creativity of your content.

Acropolis Case Study PowerPoint Template

Acropolis Case Study PowerPoint Template

The Acropolis Case Study PowerPoint Template, provided by RRGraph Design, is an extensive asset for your presentations. With 45 unique slides, over 90 custom theme colors, and options for light or dark backgrounds, this template is fully customizable. It also includes handmade infographics to enhance your storytelling. Designed to accompany your business development stages, it’s a great tool for project presentations and brand recognition.

Casevoke Case Study PowerPoint Template

Casevoke Case Study PowerPoint Template

The Casevoke Case Study PowerPoint Template is a versatile presentation resource suitable for various purposes, including case studies, research, reports, and proposals. It offers 30 easily-editable master slides with 16:9 widescreen ratio, customizable graphics, a placeholder for pictures, and an included data chart. The usage of recommended free web fonts ensures an aesthetically appealing presentation. Please note, images in the demo are for preview purposes only.

Busca Business Case Study PowerPoint

Busca Business Case Study PowerPoint

The Busca Business Case Study PowerPoint is a universally adaptable presentation template, perfect for a spectrum of uses – from creative agencies and corporate business profiles to personal portfolios and start-ups. This asset, featuring 30 easily editable slides available in three color options, boasts a 16:9 wide screen ratio and a simple drag-and-drop mechanism. Please note, demo images are for preview only and not included in the file.

Bresky Case Study PowerPoint Template

Bresky Case Study PowerPoint Template

The Bresky Case Study PowerPoint Template offers a sleek and unique design for a variety of presentation needs. With 25 slides that have been carefully created for aesthetic appeal and usability, it’s a versatile choice for any business, portfolio or branding project. Easy to use and customizable, it focuses on strong typography and incorporates unique mockup devices and portfolio slides, providing a professional and modern feel to any presentation.

Minimal Case Study PowerPoint Template

Minimal Case Study PowerPoint Template

The Minimal Case Study PowerPoint Template is a versatile and user-friendly tool. Ideal for creative agencies, startups, corporations and more, it features 15 customizable slides and easy-to-edit elements. It has an intuitive drag-and-drop image feature, and the text, photos, shapes and colors are all easily adjustable. The template comes in a 16:9 ratio and uses free fonts. Note, images aren’t included.

Case Study and Education PowerPoint Template

Case Study and Education PowerPoint Template

The Case Study and Education PowerPoint Template offers a professional, ultra-modern design for educational and academic presentations. With 20 resizable and editable slides, this versatile template can be used for any topic, from school research projects to management seminars. With user-friendly features like drag-and-drop picture placeholders, free web fonts, and wide screen ratio, creating an engaging presentation becomes effortless.

Case Study PowerPoint Template

Case Study Powerpoint Template is a sleek and professional presentation asset well-suited for those aiming for a clean, creative, and unique style. It features over 20 unique slides, a customizable color palette to match your brand, and is fully editable with easy-to-use drag and drop functions. With its high quality, resizable vector elements and free fonts, it’s an accessible tool to elevate your presentations.

Case Study Business PowerPoint Template

Case Study Business PowerPoint Template

The Case Study Business PowerPoint Template is a sleek, minimalist style presentation tool ideal for various needs such as business proposals, lookbooks, and project pitches. With 30 unique slides, light and dark backgrounds, resizable graphics, and a drag & drop image feature, it offers versatility and ease of use. The package includes PowerPoint files, color schemes, a help file, and an icon pack, although images must be supplied separately.

Case Study Presentation Template

The Case Study Presentation Template is a unique, clean, and professional PowerPoint tool perfect for creating captivating presentations. With over 20 unique, easy-to-edit slides, a full HD 16:9 ratio, and a master slide layout allowing easy photo replacement, this asset is a time-saver. The minimalistic and creative design makes for engaging presentations that align with your brand’s aesthetics.

Scilast Study Case Lab Template PowerPoint

Scilast Study Case Lab Template PowerPoint

The Scilast Study Case Lab Template PowerPoint is a versatile and artistically designed presentation tool. Perfect for both corporate and individual presentations, it boasts of a total of 60 slides, with an equal mix of light and dark themes to suit your style. It’s easily customizable with a widescreen ratio of 16:9 and includes master slide layouts. Please note, images used in previews are not included.

Information

  • Author Services

Initiatives

You are accessing a machine-readable page. In order to be human-readable, please install an RSS reader.

All articles published by MDPI are made immediately available worldwide under an open access license. No special permission is required to reuse all or part of the article published by MDPI, including figures and tables. For articles published under an open access Creative Common CC BY license, any part of the article may be reused without permission provided that the original article is clearly cited. For more information, please refer to https://www.mdpi.com/openaccess .

Feature papers represent the most advanced research with significant potential for high impact in the field. A Feature Paper should be a substantial original Article that involves several techniques or approaches, provides an outlook for future research directions and describes possible research applications.

Feature papers are submitted upon individual invitation or recommendation by the scientific editors and must receive positive feedback from the reviewers.

Editor’s Choice articles are based on recommendations by the scientific editors of MDPI journals from around the world. Editors select a small number of articles recently published in the journal that they believe will be particularly interesting to readers, or important in the respective research area. The aim is to provide a snapshot of some of the most exciting work published in the various research areas of the journal.

Original Submission Date Received: .

  • Active Journals
  • Find a Journal
  • Proceedings Series
  • For Authors
  • For Reviewers
  • For Editors
  • For Librarians
  • For Publishers
  • For Societies
  • For Conference Organizers
  • Open Access Policy
  • Institutional Open Access Program
  • Special Issues Guidelines
  • Editorial Process
  • Research and Publication Ethics
  • Article Processing Charges
  • Testimonials
  • Preprints.org
  • SciProfiles
  • Encyclopedia

behavsci-logo

Article Menu

factor analysis case study ppt

  • Subscribe SciFeed
  • Recommended Articles
  • Google Scholar
  • on Google Scholar
  • Table of Contents

Find support for a specific problem in the support section of our website.

Please let us know what you think of our products and services.

Visit our dedicated information section to learn more about MDPI.

JSmol Viewer

Exploring the potential of variational autoencoders for modeling nonlinear relationships in psychological data.

factor analysis case study ppt

Share and Cite

Milano, N.; Casella, M.; Esposito, R.; Marocco, D. Exploring the Potential of Variational Autoencoders for Modeling Nonlinear Relationships in Psychological Data. Behav. Sci. 2024 , 14 , 527. https://doi.org/10.3390/bs14070527

Milano N, Casella M, Esposito R, Marocco D. Exploring the Potential of Variational Autoencoders for Modeling Nonlinear Relationships in Psychological Data. Behavioral Sciences . 2024; 14(7):527. https://doi.org/10.3390/bs14070527

Milano, Nicola, Monica Casella, Raffaella Esposito, and Davide Marocco. 2024. "Exploring the Potential of Variational Autoencoders for Modeling Nonlinear Relationships in Psychological Data" Behavioral Sciences 14, no. 7: 527. https://doi.org/10.3390/bs14070527

Article Metrics

Further information, mdpi initiatives, follow mdpi.

MDPI

Subscribe to receive issue release notifications and newsletters from MDPI journals

Analysis of Raise Boring with Grouting as an Optimal Method for Ore Pass Construction in Incompetent Rock Mass—A Case Study

  • Published: 21 June 2024

Cite this article

factor analysis case study ppt

  • Cluber Rojas 1 ,
  • Angelina Anani   ORCID: orcid.org/0000-0001-9125-6877 2 ,
  • Eduardo Cordova 1 ,
  • Wedam Nyaaba 3 ,
  • Edward Wellman 2 &
  • Sefiu O. Adewuyi 2  

Explore all metrics

The construction of ore pass systems in underground mines is a high-risk activity, especially in an environment with incompetent rock mass. This study aims to investigate the optimal method for ore pass construction in incompetent rock masses. We evaluated the conventional and raise boring (RB) methods based on safety, efficiency, excavation control, and ground support for ore pass construction. We also performed a stability analysis using analytical Q-raise ( Q R method) and kinematic analysis methods for ore pass construction with a Raise Borer before and after grout injection of the rock mass. As a case study, an ore pass (diameter, 3 m; depth, 100 m) within an incompetent rock mass was considered to gain further insight. The rock mass was characterized according to the classification methods Q Barton, rock quality designation (RQD), rock mass rating (RMR), and geological strength index (GSI). The grout intensity number (GIN) method of grout injection is used. The safety factor (<1.075) obtained before injection was lower than the acceptance criteria in all sections of the rock mass. However, grout injection before Raise Borer excavation resulted in a rock mass safety factor greater than 1.5. Using RB without pre-grouting in the case study indicated that the maximum unsupported diameter (MUSD) of the ore pass was less than the required 3 m. On the contrary, an MUSD of the rock mass post-grouting was equal to or larger than 3 m at all depths.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

factor analysis case study ppt

Similar content being viewed by others

factor analysis case study ppt

Stability analysis and optimization of the support system of an underground powerhouse cavern considering rock mass variability

factor analysis case study ppt

Appropriate Support Design for Incline Lip Opening and Drivages in Fragile Rock Mass Formation: An Empirical and Numerical Based Case Study

factor analysis case study ppt

Instability mechanism and stability control of gob-side entry in a deep mine: a case study

Data availability.

The datasets analyzed during the current study are available from the corresponding author upon reasonable request.

Burgos S (2015) Desarrollo de herramientas de diseño para la estabilidad de excavaciones con entrada de personal, Santiago de Chile (Development of design tools for the stability of excavations with worker entry), [Thesis, Universidad de Chile]

Rivera M (2015) Construccion de chimeneas raise borer para optimizar el proceso de minado y los costos de explotacion en el tajo 355 de Reina Leticia en Compañia Minera Raura S.A. Huancayo-Perú (Construction ore pass with raise borer to optimize the mining process and exploitation costs in pit 355 of Reina Leticia at Compania Minera Raura S.A. Huancayo-Peru) [Thesis, Universidad Nacional Del Centro Del Perú]

Zuni J (2015) Construccion de chimeneas de equilibrio con plataforma elevadora Alimak en las obras subterráneas del proyecto hidroelectrico Misicuni (Construction of balancing shafts with Alimak raising platform in the underground Misicuni hydroelectric Project), Cochabamba-Bolivia. https://alicia.concytec.gob.pe/vufind/Record/UNSA_edc7332f00ef8c7ac252a8ec348f536f

Medina C (2020) Estudio comparativo técnico-económico de diseño de chimenea, caso chimenea mina Pajonales (Comparative technical-economic study of ore pass design, Pajonales mine ore pass case study) (Manual-Alimak-Raise Boring). Concepción, [Doctoral dissertation, Universidad Andrés Bello]

Subash TR, Abhilash UKR, Ananth M, Tamilsevan K (2016) Pre-grouting for leakage control and rock improvement. J Civil Environ Eng 6(226):2. https://doi.org/10.4172/2165-784X.1000226

Article   Google Scholar  

Beck and Thomas (2014) Flexible rock support application and methods. Norwegian Tunneling Technology 23:87–96

Google Scholar  

Andia F (2019) Diseño de chimeneas gemelas para mejorar la ventilación en los niveles 1790 - 2050 veta Paula CIA Minera Yanaquihua, Arequipa (Design of twin ore passes to improve ventilation at levels 1790 - 2050 vein Paula CIA Minera Yanaquihua, Arequipa). https://alicia.concytec.gob.pe/vufind/Record/UNSA_8a20319056d4e73c5686f7e1c671d891

Soria J (2013) Optimización de costos en la construcción de chimeneas con trepadoras Alimak unidad Parcoy-Consorcio Minero Horizonte 2012 (Optimization of costs in the construction of ore pass with Alimak raise climber Parcoy-Consorcio Minero Horizonte unit 2012), [Doctoral dissertation, Universidad Nacional Micaela Bastidas Deapurímac].

Quinto Robles J (2019) Análisis geomecánico en la ejecución del Raise Borer 19 Mina Islay (Geomechanical analysis in the execution of the Raise Borer 19 Islay Mine), [Thesis, Universidad Nacional Daniel Alcides Carrión]

Choque C (2011) Análisis comparativo de métodos mecanizados para la construcción de chimeneas en la Unidad Minera Retamas-Parcoy (Comparative analysis of mechanized methods for the construction of ore pass in the Retamas-Parcoy Mining Unit). Tacna-Perú [Thesis, Universidad Nacional Jorge Basadre Grohmann]

Cajahuanca Sánchez J (2019) Influencia del sistema Raise Boring en la ventilación de la Zona I-B NV 4530 de la Veta Alexia de la Unidad Minera Arcata - Cía Minera Ares S.AC (Influence of the Raise Boring system on the ventilation of Zone I-B NV 4530 of the Alexia Vein of the Arcata Mining Unit – Cía Minera Ares S.AC), [Thesis, Universidad Nacional del Centro Del Perú]

Bedoya Cabrera WW (2018) Ejecución de chimenea con el método raise boring para la optimización del sistema de ventilación en la Unidad Minera San Rafael (Execution of ore pass with the raise boring method for the optimization of the ventilation system in the San Rafael Mining Unit)- Minsur – 2018, [Thesis, National University of Santiago Antunez Mayolo]

Contreras LLica LE (2015) Perforación de Chimeneas con el método raise boring en la unidad minera arcata (Ore pass drilling with Raise Boring Method at the Arcata Mining Unit) [Thesis, Universidad Nacional de San Agustín de Arequipa]

Penney AR, Stephenson RM, Pascoe MJ (2018) Raise bore stability and risk assessment empirical database update. In: Proceedings The Fourth Australasian Ground Control in Mining Conference, Sydney, Australia November

Deere DU, Hendron AJ, Patton FD, Cording EJ (1966) Design of surface and near-surface construction in rock. The 8th US Symposium on rock mechanics (USRMS). OnePetro

Barton N, Lien R, Lunde JJRM (1974) Engineering classification of rock masses for the design of tunnel support. Rock Mech 6:189–236

Bieniawski ZT (1989) Engineering rock mass classifications: a complete manual for engineers and geologists in mining, civil, and petroleum engineering. John Wiley & Sons

Hoek E, Marinos P (2000) Predicting tunnel squeezing problems in weak heterogeneous rock masses. Tunnels and Tunneling International 32(11):45–51

Vaskou P, de Quadros EF, Kanji MA, Johnson T, Ekmekci M (2019) ISRM suggested method for the Lugeon test. Rock Mech Rock Eng 52:4155–4174

Fernandez H, de Marzo De (2015) Inyecciones de suelos y rocas capitulo 6 (Injection of soil and rock, chapter 6). Obtained from https://de.slideshare.net/jaimeamambalzambrano/cap-6-inyecciones-de-suelos-yrocas?nomobile=true

McCracken A, Stacey TR (1989) Geotechnical risk assessment for large-diameter raise-bored shafts. Shaft engineering conference. pp 309–316

Lombardi G, Deere D (1993) Diseño y control del inyectado empleando el principio “GIN” (Design and control of injected using the “GIN” principle). In: Water Power and Dams Construction - Traduzido por Ulrich. Hungsberg. México, p 35

Hoek E (2023) Practical rock engineering: RocScience. Available from the publisher at https://static.rocscience.cloud/assets/resources/learning/hoek/Practical-Rock-Engineering-E.Hoek-2023.pdf .

Hoek E, Brown ET (1997) Practical estimates of rock mass strength. Int J Rock Mech Min Sci 34(8):1 165-1, 186

Kirsch G (1898) Die theorie der elastizitaet und die deduerfnisse der festigkeitlehre. Zeitschrift des Vereines deutscher Ingenieure (The theory of elasticity and the requirements of strength theory. Journal of the Association of German Engineers 42:797–807

Hoek E, Bray JW (1981) Rock slope engineering. Revised 3rd Edition. The Institution of Mining and Metallurgy, London, pp 341–351

Book   Google Scholar  

Download references

Acknowledgements

We would like to thank Skava Consulting SA for providing the data and resources needed to carry out this research work.

Author information

Authors and affiliations.

Department of Mining Engineering, Pontificia Universidad Católica de Chile, Av. Vicuña Mackenna # 4686, Santiago, Chile

Cluber Rojas & Eduardo Cordova

Department of Mining and Geological Engineering, The University of Arizona, Tucson, AZ, 85721, USA

Angelina Anani, Edward Wellman & Sefiu O. Adewuyi

Department of Orthopaedics, University of Illinois at Chicago, 835 S Wolcott Ave, Chicago, IL, 60612, USA

Wedam Nyaaba

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Angelina Anani .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Rojas, C., Anani, A., Cordova, E. et al. Analysis of Raise Boring with Grouting as an Optimal Method for Ore Pass Construction in Incompetent Rock Mass—A Case Study. Mining, Metallurgy & Exploration (2024). https://doi.org/10.1007/s42461-024-01023-0

Download citation

Received : 01 August 2023

Accepted : 03 June 2024

Published : 21 June 2024

DOI : https://doi.org/10.1007/s42461-024-01023-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Rock Mass Stability
  • Grout Injection
  • Raise Borer
  • Find a journal
  • Publish with us
  • Track your research
  • Open access
  • Published: 22 June 2024

Correlation between the incidence of inguinal hernia and risk factors after radical prostatic cancer surgery: a case control study

  • An-Ping Xiang 1 , 2 ,
  • Yue-Fan Shen 1 ,
  • Xu-Feng Shen 1 &
  • Si-Hai Shao 1  

BMC Urology volume  24 , Article number:  131 ( 2024 ) Cite this article

52 Accesses

Metrics details

The incidence of recurrent hernia after radical resection of prostate cancer is high, so this article discusses the incidence and risk factors of inguinal hernia after radical resection of prostate cancer.

This case control study was conducted in The First People’s Hospital of Huzhou clinical data of 251 cases underwent radical resection of prostate cancer in this hospital from March 2019 to May 2021 were retrospectively analyzed. According to the occurrence of inguinal hernia, the subjects were divided into study group and control group, and the clinical data of each group were statistically analyzed, Multivariate Logistic analysis was performed to find independent influencing factors for predicting the occurrence of inguinal hernia. The Kaplan-Meier survival curve was drawn according to the occurrence and time of inguinal hernia.

The overall incidence of inguinal hernia after prostate cancer surgery was 14.7% (37/251), and the mean time was 8.58 ± 4.12 months. The average time of inguinal hernia in patients who received lymph node dissection was 7.61 ± 4.05 (month), and that in patients who did not receive lymph node dissection was 9.16 ± 4.15 (month), and there was no significant difference between them ( P  > 0.05). There were no statistically significant differences in the incidence of inguinal hernia with age, BMI, hypertension, diabetes, PSA, previous abdominal operations and operative approach ( P  > 0.05), but there were statistically significant differences with surgical method and pelvic lymph node dissection ( P  < 0.05). The incidence of pelvic lymph node dissection in the inguinal hernia group was 24.3% (14/57), which was significantly higher than that in the control group 11.8% (23/194). Logistic regression analysis showed that pelvic lymph node dissection was a risk factor for inguinal hernia after prostate cancer surgery (OR = 0.413, 95%Cl: 0.196–0.869, P  = 0.02). Kaplan-Meier survival curve showed that the rate of inguinal hernia in the group receiving pelvic lymph node dissection was significantly higher than that in the control group ( P  < 0.05).

Pelvic lymph node dissection is a risk factor for inguinal hernia after radical resection of prostate cancer.

Peer Review reports

Prostate cancer is a common malignant tumor in urology, which occurs in the prostate epithelial tissue, There are an average of 190,000 new cases of prostate cancer each year and about 80,000 deaths worldwide each year [ 1 , 2 ]. In recent years, the incidence of prostate cancer has increased year by year, seriously affecting the health and quality of life of patients [ 3 ]. Worldwide, the incidence of prostate cancer is second only to lung cancer, and its death rate ranks 7th among male cancer causes [ 4 ]. Radical resection of prostate cancer (RP) is the main means for the treatment of prostate cancer, and the surgical methods are generally divided into open radical resection of prostate cancer (RRP) and minimally invasive radical resection of prostate cancer, the latter including laparoscopic radical resection of prostate cancer (LRP) and robot-assisted laparoscopic radical resection of prostate cancer (RALP) [ 5 , 6 , 7 ].

Inguinal hernia (IH) is a relatively common disease in clinic, which is caused by increased abdominal pressure, thinning of abdominal wall, and bulging of abdominal organs. Inguinal hernias include direct hernias, oblique hernias and femoral hernias [ 8 ]. At the onset, lumps protruding outward from the inguinal region can be seen. If the intestines cannot return to the abdominal cavity in time, it is easy to cause intestinal necrosis, intestinal obstruction, intestinal perforation and other complications, which may endanger the life safety of patients in severe cases [ 9 , 10 ].

With the extensive development of radical resection of prostate cancer in various hospitals, the problem of postoperative inguinal hernia has gradually attracted the attention of urologists. The previously reported incidence of IH after radical prostate cancer surgery was approximately 13.7% [ 11 ]. A study by Nagatani S et al. showed that the incidence of inguinal hernia after radical prostate cancer surgery was 7-21%, most of which occurred within 2 years after surgery [ 12 ]. A study by Stranne J et al. showed that the cumulative risk of IH occurrence within 48 months in open radical resection for prostate cancer group and non-surgical group was 12.2% and 5.8%, respectively [ 13 ]. Most cases of IH require surgery due to pain, discomfort, and incarceration and are considered an advanced complication of radical resection of prostate cancer. The adhesion after radical resection of prostate cancer also increases the difficulty of hernia repair. Therefore, urologists need to be concerned not only about the risk of urinary incontinence and erectile dysfunction after radical resection of prostate cancer, but also about the occurrence of IH.

In recent 10 years, many scholars around the world have studied the risk factors of inguinal hernia after radical prostate cancer surgery. Currently, most of the studies believe that anastomotic stenosis, previous history of inguinal hernia, and patent processus vaginalis are risk factors, However there is no consensus on the risk of lymph node dissection. For example, Niitsu H et al. believed that pelvic lymph node dissection during radical prostate cancer operation might damage the pectineal foramina, thereby increasing the risk of inguinal hernia [ 14 ]. Contrary to the results of Johan Stranne’s study, the author suggested that previous incidence of inguinal hernia and advanced age increased the risk of inguinal hernia after radical prostate cancer surgery, and pelvic lymph node dissection was not a significant risk factor [ 15 ]. There is also no consistent conclusion on the influence of BMI, age and surgical method.

Therefore, in order to further investigate the risk factors of inguinal hernia after radical prostate cancer surgery, especially the correlation between pelvic lymph node dissection and inguinal hernia, this study was conducted. This study retrospectively analyzed the clinical data of 251 patients who underwent radical resection of prostate cancer in our hospital from March 2019 to May 2021, and investigated the risk factors of postoperative inguinal hernia. It is reported as follows:

Research objectives

The objective of this study was to explore the incidence and risk factors of inguinal hernia after radical resection of prostate cancer, which provides reference for further research and guide the clinician to choose the appropriate surgical method according to the patient’s condition.

Research methods

The patient was also examined by B-ultrasound every 3 months at the outpatient PSA review to verify the occurrence of inguinal hernia. The subjects were divided into the inguinal hernia group (study group) and the non-inguinal hernia group (control group), If the diagnosis of inguinal hernia occurred, the follow-up was completed, and the type and time of inguinal hernia were recorded; otherwise, the follow-up was 2 years, and the relevant clinical parameters of each group were statistically analyzed (age, BMI, hypertension, diabetes mellitus, PSA value, previous abdominal operations, operation methods, operative approach, pelvic lymph node dissection)and the correlation between these parameters and the occurrence of inguinal hernia was analyzed, and the risk factors of inguinal hernia were found by Logistic regression analysis. According to the occurrence and time of inguinal hernia, Kaplan-Meier survival curve was drawn to compare the differences between the two groups.

The content of this study has been approved by the Ethics Committee of our hospital(approval number, 2,018,137). All patients signed informed consent forms. This is the protocol was registered on the Chinese Clinical Trial Registry. The study is planned to begin in mid-March 2019 and is planned to end by May 2021.

Inclusion criteria

Patients who received radical surgery for prostate cancer in Huzhou First People’s Hospital from March 2019 to May 2021; PSA was reviewed every 3 months after surgery, and check the inguinal area for protruding masses. Complete the 2-year follow-up plan.

Exclusion criteria

Patients with inguinal hernia before operation; patients with prior inguinal hernia surgery.

Statistical methods

SPSS 21.0 statistical software was used for statistical processing, the research data followed normal distribution, and the measured data were represented by X ± S. P  < 0.05 was considered statistically significant.

From March 2019 to May 2021, 318 cases of radical prostatectomy were performed in our hospital, during the follow-up period, a total of 28 cases died of other diseases, a total of 39 cases were lost to follow-up or clinical data were incomplete, and a total of 251 cases were finally followed up. There were no significant differences in age, BMI, hypertension, diabetes, PSA, previous abdominal operations and operative approach between the two groups ( P  > 0.05), while there were significant differences in surgical method and pelvic lymph node dissection ( P  < 0.05). The incidence of pelvic lymph node dissection in the inguinal hernia group 24.3% (14/57) was significantly higher than that in the control group 11.8% (23/194). See Table  1 for details.

Multivariate Logistic regression analysis of risk factors showed that pelvic lymph node dissection was a risk factor for inguinal hernia after prostate cancer surgery (OR =0.413, 95%Cl: 0.196-0.869, P  = 0.02). There was no statistical significance in age, BMI, hypertension, diabetes, PSA value, previous abdominal operations, operation method, operative approach were not risk factors for inguinal hernia ( P  > 0.05). See Table  2 for details.

The cases of inguinal hernia were grouped according to whether or not they had received pelvic lymph node dissection. The incidence and time of inguinal hernia in the two groups were recorded, and the Kaplan-Meier survival curve was drawn. The overall incidence of inguinal hernia after radical resection of prostate cancer was 14.7% (37/251), There were 26 cases with indirect hernia, accounting for 70.2% (26/37), 21.6% (8/37) with direct hernia, 8.2% (3/37) with oblique hernia and direct hernia, and the mean time of occurrence was 8.58 ± 4.12 months. The average time of inguinal hernia was 7.61 ± 4.05 (month) for those who received lymph node dissection and 9.16 ± 4.15 (month) for those who did not receive lymph node dissection, and there was no significant difference between them ( P  > 0.05). The incidence of inguinal hernia in the group receiving pelvic lymph node dissection was significantly higher than that in the control group ( P  < 0.05). See Fig.  1 for details.

figure 1

Survival curve of pelvic lymph node dissection and inguinal hernia (month)

In recent years, the incidence of prostate cancer has increased year by year, seriously affecting the health and quality of life of patients, the complications after radical prostate cancer surgery mainly include urinary incontinence and sexual dysfunction, but inguinal hernia is also one of the common complications [ 16 ]. Liu L et al. found that open radical resection for prostate cancer technique and advanced patient age, especially those over 80 years old, are associated with a higher incidence of IH. Appropriate prophylaxis during surgery should be evaluated in high-risk patients [ 17 ].In some regional studies, low BMI has been identified as a risk factor for IH, and the risk threshold for BMI has not been determined, which is about BMI < 25 kg/m2 [ 18 ]. However, a number of studies have found that low BMI does not increase the risk of postoperative IH [ 19 , 20 ]. At present, there is no uniform conclusion on the risk of IH between open radical resection for prostate cancer and laparoscopic radical prostatectomy. The study of Alder R scholars believed that the incidence of IH after laparoscopic radical prostatectomy was relatively low [ 21 ], while Otaki T’s study shows that the incidence of IH after laparoscopic radical prostatectomy is 7.3% and that of open radical resection for prostate cancer is 8.4%, showing no statistical difference between them [ 20 ]. There is no consensus on whether pelvic lymph node dissection is a risk factor for inguinal hernia [ 14 , 15 ]. In short, the specific mechanism of inguinal hernia after radical prostate cancer surgery is unclear.

This study retrospectively analyzed the clinical data of 251 cases treated in our hospital, and found that the overall incidence of inguinal hernia was 14.7% (37/251), which was consistent with most of the current research results. We also found that the average time of occurrence of inguinal hernia after surgery was 8.58 ± 4.12 months, which provided certain guidance for our postoperative follow-up time.

In this study, through Logistic multivariate analysis, it was found that pelvic lymph node dissection was a risk factor for inguinal hernia after prostate cancer surgery (OR = 0.413, 95%Cl: 0.196–0.869, P  = 0.02). There was no statistical significance in age, BMI, hypertension, diabetes, PSA value, previous abdominal operations, operation method, operative approach and the occurrence of inguinal hernia after prostate cancer surgery ( P  > 0.05),but there were statistically significant differences with surgical method and pelvic lymph node dissection ( P  < 0.05). Therefore, the advantages and disadvantages of pelvic lymph node dissection should be reasonably evaluated for low-medium-risk prostate cancer patients, so as to avoid the occurrence of inguinal hernia. By drawing Kaplan-Meier survival curve, it was found that the rate of inguinal hernia in the group receiving pelvic lymph node dissection was significantly higher than that in the control group. Some studies believe that pelvic lymph node dissection during radical resection of prostate cancer operation will cause postoperative scar contraction in the inguinal region, resulting in an increase in abdominal pressure outward and downward, resulting in an increase in the incidence of inguinal hernia. Lodding P designed a comparative study between the group of radical resection of prostate cancer plus pelvic lymph node dissection, the group of pelvic lymph node dissection and the group without operation. They found that the incidence of inguinal hernia in the three observation groups was 13.6%, 7.6% and 3.1%, respectively, and the difference between the prostatectomy group and the group without operation was statistically significant. There was no significant difference between the group and pelvic lymph node dissection group. This result implies that pelvic lymph node dissection is an important factor in the development of inguinal hernia [ 22 ]. Another Sun M study compared the incidence of inguinal hernias after radical prostate cancer surgery and pelvic lymph node dissection alone, and showed that the risk of inguinal hernias increased by 6.8% and 7.8% at 5 and 10 years, respectively, in the radical prostate cancer resection group compared with the pelvic lymph node dissection group [ 23 ]. Niitsu H et al. believed that pelvic lymph node dissection during radical resection of prostate cancer might damage the pectineal foramina, while inguinal hernia originated from the defective pectineal foramina [ 14 ].

Shimbo M et al. found that due to prostatectomy and vesicourethral anastomosis, preoperative and postoperative sagittal MRI images showed that the rectovesical excavation (RE) was moved downward by about 2 to 3 cm [ 24 ]. Accordingly, they speculated that due to the displacement of RE, the peritoneum and vas deferens after urethrovesical anastomosis were pulled, which further pulled the opening of the inner ring and caused it to shift medially, which led to the occurrence of postoperative IH. Based on this theory, many scholars have prevented the occurrence of hernia after operation by reducing the tension of peritoneum and vas deferens at the inner ring and ligation and rupture of sheathing process. Several other articles have reported the role of preserving the retropubic space (RS) in preventing IH after radical resection of prostate cancer. Chang KD et al. found that robot-assisted laparoscopic radical prostatectomywith retained Retzius space significantly reduced the incidence of postoperative IH compared with standard robot-assisted laparoscopic radical prostatectomy [ 25 ]. In addition, the study of Matsubara et al. also showed that compared with standard open radical resection for prostate cancer, the incidence of IH after transperineal radical resection of prostate cancer with retained anatomical structures such as the Retzius space was lower [ 26 ]. Therefore, urological surgeons can take some effective measures in the operation to prevent the recurrence of inguinal hernia.

In this study, we identified risk factors for inguinal hernia after pelvic lymphadenectomy for prostate cancer. Other risk factors such as age, BMI, hypertension, diabetes mellitus, PSA value, history of abdominal surgery, operative method, operative approach were not significant in multivariate analysis, which was inconsistent with the results of Iwamoto H et al [ 27 ]. They found that dilatation of the right internal inguinal ring and different manipulation of the medial peritoneal incision of the ventral femoral ring were independent risk factors for IH after laparoscopic radical prostatectomy. The reason why postoperative IH occurs more often on the right side is not known. Alder R et al. found that the incidence of IH after open radical prostate cancer treatment was significantly higher than laparoscopic radical prostate cancer treatment [ 21 ], but our study did not show a difference between the two groups, possibly due to the small number of cases included in open radical prostate surgery.

In summary, the incidence of inguinal hernia after radical prostate cancer surgery is relatively high, and the specific cause is still unclear. Our study shows that pelvic lymph node dissection is a risk factor for inguinal hernia.

Limitations

The sample size of this study is small, and it belongs to a single-center study, so the representativeness of the research conclusions may not be strong. This time, we followed up the samples for 2 years, which was not long enough and may have overlooked the real incidence of inguinal hernia. In addition, this study is a retrospective study, and the clinical parameters observed are not very comprehensive, which may ignore the influence of other factors on the IH. Because our data is derived from clinical data, some data cannot be detected. These problems need further study by more scholars.

Data availability

We cannot provide and share our datasets in publicly available repositories because of informed consent for participants as confidential patient data. Data may be obtained from the corresponding author upon reasonable request.

Sekhoacha M, Riet K, Motloung P et al. Prostate Cancer Review: Genetics, diagnosis, Treatment options, and alternative approaches. Molecules 2022; 27.

Rawla P. Epidemiology of prostate Cancer. World J Oncol. 2019;10(2):63–89.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Vietri MT, D’Elia G, Caliendo G et al. Hereditary prostate Cancer: genes related, Target Therapy and Prevention. Int J Mol Sci 2021; 22.

Williams IS, McVey A, Perera S, et al. Modern paradigms for prostate cancer detection and management. Med J Aust. 2022;217:424–33.

Article   PubMed   PubMed Central   Google Scholar  

Achard V, Panje CM, Engeler D, et al. Localized Local Adv Prostate Cancer: Treat Options Oncol. 2021;99:413–21.

CAS   Google Scholar  

Davis M, Egan J, Marhamati S, et al. Retzius-Sparing Robot-assisted robotic prostatectomy: past, Present, and Future. Urol Clin North Am. 2021;48:11–23.

Article   PubMed   Google Scholar  

Heidenreich A, Pfister D. Radical cytoreductive prostatectomy in men with prostate cancer and oligometastatic disease. Curr Opin Urol. 2020;30:90–7.

Miller HJ. Inguinal hernia: mastering the anatomy. Surg Clin North Am. 2018;98:607–21.

Gamborg S, Marcussen ML, Öberg S, Rosenberg J. Inguinal hernia repair but no Hernia Present: a Nationwide Cohort Study. Surg Technol Int. 2022;40:171–4.

PubMed   Google Scholar  

Chien S, Cunningham D, Khan KS. Inguinal hernia repair: a systematic analysis of online patient information using the modified ensuring Quality Information for patients tool. Ann R Coll Surg Engl. 2022;104:242–8.

CAS   PubMed   PubMed Central   Google Scholar  

Perez AJ, Campbell S. Inguinal hernia repair in older persons. J Am Med Dir Assoc. 2022;23(4):563–7.

Nagatani S, Tsumura H, Kanehiro T, et al. Inguinal hernia associated with radical prostatectomy. Surg Today. 2021;51:792–7.

Stranne J, Johansson E, Nilsson A, et al. Inguinal hernia after radical prostatectomy for prostate cancer: results from a randomized setting and a nonrandomized setting. Eur Urol. 2010;58:719–26.

Niitsu H, Taomoto J, Mita K, et al. Inguinal hernia repair with the mesh plug method is safe after radical retropubic prostatectomy. Surg Today. 2014;44:897–901.

Stranne J, Hugosson J, Lodding P. Post-radical retropubic prostatectomy inguinal hernia: an analysis of risk factors with special reference to preoperative inguinal hernia morbidity and pelvic lymph node dissection. J Urol. 2006;176:2072–6.

Tolle J, Knipper S, Pose R, et al. Evaluation of risk factors for adverse functional outcomes after Radical Prostatectomy in patients with previous transurethral surgery of the prostate. Urol Int. 2021;105:408–13.

Liu L, Xu H, Qi F, et al. Incidence and risk factors of inguinal hernia occurred after radical prostatectomy-comparisons of different approaches. BMC Surg. 2020;20(1):218.

Nilsson H, Stranne J, Hugosson J, et al. Risk of hernia formation after radical prostatectomy: a comparison between open and robot-assisted laparoscopic radical prostatectomy within the prospectively controlled LAPPRO trial. Hernia. 2022;26:157–64.

Article   CAS   PubMed   Google Scholar  

Sim KC, Sung DJ, Han NY, et al. Preoperative CT findings of subclinical hernia can predict for postoperative inguinal hernia following robot-assisted laparoscopic radical prostatectomy. Abdom Radiol (NY). 2018;43:1231–6.

Otaki T, Hasegawa M, Yuzuriha S, et al. Clinical impact of psoas muscle volume on the development of inguinal hernia after robot-assisted radical prostatectomy. Surg Endosc. 2021;35:3320–8.

Alder R, Alder R, Rosenberg J. Incidence of Inguinal Hernia after Radical Prostatectomy: a systematic review and Meta-analysis. J Urol. 2020;203(2):265–74.

Lodding P, Bergdahl C, Nyberg M, et al. Inguinal hernia after radical retropubic prostatectomy for prostate cancer: a study of incidence and risk factors in comparison to no operation and lymphadenectomy. J Urol. 2001;166:964–7.

Sun M, Lughezzani G, Alasker A, et al. Comparative study of inguinal hernia repair after radical prostatectomy, prostate biopsy, transurethral resection of the prostate or pelvic lymph node dissection. J Urol. 2010;183:970–5.

Shimbo M, Endo F, Matsushita K, et al. Risk factors and a Novel Prevention technique for Inguinal Hernia after Robot-assisted radical prostatectomy. Urol Int. 2017;98:54–60. Incidence.

Chang KD, Abdel Raheem A, Santok GDR, et al. Anatomical Retzius-space preservation is associated with lower incidence of postoperative inguinal hernia development after robot-assisted radical prostatectomy. Hernia. 2017;21:555–61.

Matsubara A, Yoneda T, Nakamoto T, et al. Inguinal hernia after radical perineal prostatectomy: comparison with the retropubic approach. Urology. 2007;70:1152–6.

Iwamoto H, Morizane S, Hikita K, et al. Postoperative inguinal hernia after robotic-assisted radical prostatectomy for prostate cancer: evaluation of risk factors and recommendation of a convenient prophylactic procedure. Cent Eur J Urol. 2019;72(4):418–24.

Google Scholar  

Download references

Acknowledgements

Not applicable.

This work was supported by the following funding: the grant 2019GY23 from Huzhou Science and Technology Bureau Public welfare application research project of China.

Author information

Authors and affiliations.

Department of Urology, The First People’s Hospital of Huzhou, #158, Square Road, Huzhou, 313000, China

An-Ping Xiang, Yue-Fan Shen, Xu-Feng Shen & Si-Hai Shao

Department of Urology, Huzhou Key Laboratory of Precise Diagnosis and Treatment of Urinary Tumors, Huzhou, 313000, China

An-Ping Xiang

You can also search for this author in PubMed   Google Scholar

Contributions

An-Ping Xiang designed the study and drafted and revised the manuscript, Yue-Fan Shen recorded the patients cases, Xu-Feng Shen participated in the follow-up. An-Ping Xiang and Si-Hai Shao analyzes the data and draw graphs.

Corresponding author

Correspondence to Si-Hai Shao .

Ethics declarations

Ethics approval and consent to participate.

The study protocol was approved by the ethics committee of the First People’s Hospital of Huzhou (approval number, 2018137). We have obtained written informed consent from all study participants. All of the procedures were performed in accordance with the Declaration of Helsinki and relevant policies in China.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Xiang, AP., Shen, YF., Shen, XF. et al. Correlation between the incidence of inguinal hernia and risk factors after radical prostatic cancer surgery: a case control study. BMC Urol 24 , 131 (2024). https://doi.org/10.1186/s12894-024-01493-w

Download citation

Received : 24 September 2023

Accepted : 30 April 2024

Published : 22 June 2024

DOI : https://doi.org/10.1186/s12894-024-01493-w

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Prostate cancer
  • Inguinal hernia

BMC Urology

ISSN: 1471-2490

factor analysis case study ppt

COMMENTS

  1. Factor Analysis with an Example

    Factor analysis is used to describe the relationship between many variables in terms of a few underlying factors. It involves 3 stages: 1) generating a correlation matrix, 2) extracting factors from this matrix using principal component analysis, and 3) rotating factors using varimax rotation. The output includes communalities, a scree plot ...

  2. PDF Factor Analysis

    Two-Common Factor Model : The Oblique Case F 1 Y 1 Y 2 Y 3 δ 1 δ 2 δ 3 λ 11 λ 21 λ 31 F 2 Y 4 Y 5 Y 6 δ 4 δ 5 δ 6 λ 12 λ 62 λ 41 λ 51 λ 61 λ 52 λ 42 λ 22 λ 32 Given all variables in standardized form, i.e. var(Y i)=var(F i)=1; AND oblique factors (i.e. cov(F 1,F 2)≠0) ! The interpretation of factor loadings: λ ij is no ...

  3. Factor analysis

    Factor analysis is a statistical technique used to reduce a large set of variables into a smaller set of underlying factors or dimensions. It examines the interrelationships among variables to define common dimensions called factors that can help explain correlations. Factor analysis is used to identify the underlying structure in a data set ...

  4. Factor analysis (fa)

    Factor analysis (fa) Here are the steps I would take to analyze this data using exploratory factor analysis: 1. Check assumptions - Sample size of 300 is adequate - Most correlations are between .3 and .8 2. Extract initial factors using principal axis factoring - Kaiser's criterion suggests 4 factors with eigenvalues > 1 3.

  5. Lesson 12: Factor Analysis

    Overview. Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors.". The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social ...

  6. PPT Factor Analysis

    DevPsy.org * In the New York Longitudinal Study (Ch. 16), Thomas, Chess, and Birth wanted to infer an underlying temperament by just looking a behaviors. ... Factor Analysis Author: K. H. Grobman Last modified by: K. H. Grobman Created Date: 8/15/2008 9:10:02 PM Document presentation format: On-screen Show Company: DevPsy.org

  7. Factor Analysis.

    §1 Introduction Factor analysis (FA) is a method of simplifying data. The essential purpose of FA is to describe, if possible, the covariance relationship among many variables in terms of a few underlying, but unobservable, random quantities called factors. For example, in studies about corporate image or brand image, customers can evaluate malls' performance by the use of an index system ...

  8. Factor Analysis Guide with an Example

    The scree plot below relates to the factor analysis example later in this post. The graph displays the Eigenvalues by the number of factors. Eigenvalues relate to the amount of explained variance. The scree plot shows the bend in the curve occurring at factor 6. Consequently, we need to extract five factors.

  9. Lecture 12 Factor Analysis.

    Download ppt "Lecture 12 Factor Analysis." Factor Analysis Factor analysis is a general name denoting a class of procedures primarily used for data reduction and summarization. Factor analysis is an interdependence technique in that an entire set of interdependent relationships is examined without making the distinction between dependent and ...

  10. Lesson 12: Factor Analysis

    Factor Analysis is a method for modeling observed variables, and their covariance structure, in terms of a smaller number of underlying unobservable (latent) "factors." The factors typically are viewed as broad concepts or ideas that may describe an observed phenomenon. For example, a basic desire of obtaining a certain social level might explain most consumption behavior.

  11. PDF Exploratory Factor Analysis Spss Ppt

    Exploratory Factor Analysis Spss Ppt Factor Analysis - Model Adequacy, Rotation, Factor Scores and Case Study. (Refer Slide Time: Then I will show you Spss exploratory factor analysis. 4) It's a data driven, exploratory statistical procedure (no theory). 5) FA is a) Principal factors (= principal axis factoring = factor analysis =

  12. PPT

    Factor Analysis: A Brief Synopsis of Factor Analytic Methods With an Emphasis on Nonmathematical Aspects. Timothy D. Kruse, M.S.Ed. Texas A&M University Commerce. Factor Analytic Methods • Factor analysis is a set of mathematical techniques used to identify dimensions underlying a set of empirical measurements.

  13. (PDF) Confirmatory Factor Analysis -- A Case study

    Confirmatory Factor Analysis (CFA) is a particular form of factor analysis, most commonly used in social. research. In confirmatory factor anal ysis, the researcher first develops a hypothesis ...

  14. Factor Analysis-Presentation DATA ANALYTICS

    Factor Analysis-Presentation DATA ANALYTICS. This document discusses factor analysis, a technique used to identify underlying dimensions or factors within a set of variables. It provides definitions of key terms like factor loadings, communality, scree plot, and factor scores. It also presents an example factor analysis using data on salespeople.

  15. A Practical Introduction to Factor Analysis: Exploratory Factor Analysis

    Purpose. This seminar is the first part of a two-part seminar that introduces central concepts in factor analysis. Part 1 focuses on exploratory factor analysis (EFA). Although the implementation is in SPSS, the ideas carry over to any software program. Part 2 introduces confirmatory factor analysis (CFA).

  16. Factor Analysis

    factor analysis.ppt - Free download as Powerpoint Presentation (.ppt), PDF File (.pdf), Text File (.txt) or view presentation slides online. This document discusses factor analysis, a statistical technique used to reduce a large number of variables into a smaller number of factors. It describes the key uses of factor analysis such as scale construction, establishing antecedents, and marketing ...

  17. Factor Analysis

    Examples of Factor Analysis. Here are some real-time examples of factor analysis: Psychological Research: In a study examining personality traits, researchers may use factor analysis to identify the underlying dimensions of personality by analyzing responses to various questionnaires or surveys. Factors such as extroversion, neuroticism, and conscientiousness can be derived from the analysis.

  18. PPT

    The basic model for a factor analysis is of the form ; 28 RELATION BETWEEN FACTOR ANALYSIS AND CMB MODELS . Receptor data (given) Target to be estimated. Source profile (given) Sjk. Cik. aij. Factor score. Factor loading. 29 Application of Factor Analysis-Multiple Regression for PM Source Apportionment in Bangkok urban area in 1996 30 Objectives

  19. FACTOR ANALYSIS.PPT

    View FACTOR ANALYSIS.PPT from MARKETING C123 at IBS Hyderabad. Goal of factor analysis To identify constructs or 'factors' that explain the correlations among a set of variables To test hypothesis ... Read the following case study, Q&A. Differential Item Functioning (DIF) is a method used to: validate multicultural assessments aid in data ...

  20. Introduction to Factor Analysis

    Jason Packer. This document introduces factor analysis, which is a statistical approach used to analyze relationships among large numbers of variables and explain them in terms of underlying common dimensions or factors. It provides examples of variables that could load onto factors related to online shopping experiences, hospital selection ...

  21. 20+ Best Case Study PowerPoint Templates for In-Depth Analysis

    Blue Case Study PowerPoint Template. The Blue Case Study PowerPoint Template offers a sleek and modern design, perfect for various presentations. Designed meticulously, this 18-slide multipurpose template allows users to easily edit graphics and texts. It's user-friendly, simply drag and drop pictures into placeholders.

  22. Exploring the Potential of Variational Autoencoders for Modeling ...

    Latent variables analysis is an important part of psychometric research. In this context, factor analysis and other related techniques have been widely applied for the investigation of the internal structure of psychometric tests. However, these methods perform a linear dimensionality reduction under a series of assumptions that could not always be verified in psychological data. Predictive ...

  23. Factor Analysis.ppt

    Factor Analysis.ppt. Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables.

  24. Analysis of Raise Boring with Grouting as an Optimal Method ...

    The construction of ore pass systems in underground mines is a high-risk activity, especially in an environment with incompetent rock mass. This study aims to investigate the optimal method for ore pass construction in incompetent rock masses. We evaluated the conventional and raise boring (RB) methods based on safety, efficiency, excavation control, and ground support for ore pass ...

  25. Correlation between the incidence of inguinal hernia and risk factors

    The incidence of recurrent hernia after radical resection of prostate cancer is high, so this article discusses the incidence and risk factors of inguinal hernia after radical resection of prostate cancer. This case control study was conducted in The First People's Hospital of Huzhou clinical data of 251 cases underwent radical resection of prostate cancer in this hospital from March 2019 to ...

  26. Module 10

    Module 10 - Factor Analysis. jamovi is a compelling alternative to costly statistical products such as SPSS and SAS. jamovi is made by the scientific community, for the scientific community. This presentation explains the process of computing the confirmatory and exploratory factor analysis with the use of the jamovi software.