Personality psychologists have conducted hundreds of studies that relate various personality measures to each other. The good news about this research is that it is relatively easy to do and doesn’t cost very much. As a result, sample sizes are big enough to produce stable estimates of the correlations between these measures. Moreover, personality psychologists often study many correlations at the same time. Thus, statistical significance is not a problem because some correlations are bound to be significant.
The key problems with personality psychology is that many studies are mono-method studies. This often leads to spurious correlations that are caused by method factors (Campbell & Fiske, 1959). For example, self-report measures often correlate with each other because they are influenced by socially desirable responding. It is therefore interesting to find articles that used multiple-methods which allows it to separate method factors and personality factors.
One common finding from multi-method studies is that the Big Five personality traits often appear correlated when they are measured with self-reports, but not when they are measured with multiple methods (i.e., multiple raters) (Anusic et al., 2009; Biesanz & West, 2004; DeYoung, 2006). Furthermore, the correlations among self-ratings of the Big Five are explained by an evaluative or desirability factors.
Despite this evidence, some personality psychologists argue that the Big Five are related to each other by substantive traits. One model assumes that there are two higher-order factors. One factor produces a positive correlation between extraversion and openness and another factor produces positive correlations between Emotional Stability (low Neuroticism), Agreeableness, and Conscientiousness. These two factors are supposed to be independent (DeYoung, 2006). Another model proposes a single higher-order factor that is called the General Factor of Personality (GFP). This factor was originally proposed by Musek (2007) and then championed by the late psychologists Rushton. Plank suggested that bad theories die after their champion dies, but in this case Dimitri van der Linden has taken it upon himself to keep the GFP alive. I have met Dimitri at a conference many years ago and discussed the GFP with him, but evidently my arguments fell on deaf ears. My main point was that you need to study factors with factor analysis. A simple sum score of Big Five scales is not a proper way to examine the GFP because this sum score also contains variance of the specific Big Five factors. Apparently, he is too stupid or lazy to learn structural equation modeling to use CFA in studies of the GFP.
Instead, he computes weighted sum scores as indicators of factors and uses these sum scores to examine relationships of higher-order factors with intelligence.
The authors then find that the Plasticity scale is related to self-rated and objective measures of intelligence and interpret this as evidence that the Plasticity factor is related intelligence. However, the Plasticity scale is just an average of Extraversion and Openness and it is possible that this correlation is driven by the unique variance in Openness rather than the shared variance between Openness and Extraversion that corresponds to the Plasticity factor. In other words, the authors fail to examine how higher-order factors are related to intelligence because they do not examine this relationship of factors, which requires structural equation modeling. Fortunately, they provided the correlations among the measures in their two studies and I was able to conduct a proper test of the hypothesis that Plasticity is related to intelligence. I fitted a multiple-group model to the correlations among the Big Five scales (different measures were used in the two studies), the self-report of intelligence, and the scores on Cattell’s IQ test. Overall model fit was acceptable, CFI = .943, RMSEA = .050. Figure 1 shows the model. First of all, there is no evidence of Stability and Plasticity as higher-order factors, which would produce correlations between Extraversion (EE) and Openness (OO) and correlations between Neuroticism (NN), Agreeableness (AA), and Conscientiousness (CC). Instead, there was a small positive correlation between Neuroticism and Openness and between Agreeableness and Conscientiousness. There was evidence of a general factor that influenced self-ratings of the Big Five (N, E, O, A, C) and self-ratings of intelligence (sri), although the effect size for self-reported intelligence was surprisingly small. This might be due to the assessment of intelligence that may have led to more honest reporting. Most important, the general factor (h) was unrelated to performance on Cattell’s test. This shows that the factor is unique to the method of self-ratings and supports the interpretation of this factor as a method factor (Anusic et al., 2009). Finally, self-ratings and objective test scores reflect a common factor which shows some valid variance in self-ratings. This has been reported before (Borenau & Liebler, 1992). The intelligence factor was related to Openness, but not with Extraversion, which is also consistent with other studies that examined the relationship between personality and IQ scores. Evidently, intelligence is not related to Plasticity because plasticity is the shard variance between Extraversion and Openness and there is no evidence that this shared variance exist and no evidence that Extraversion is related to intelligence.
These results show that van der Linden and colleagues came to the wrong conclusion because they did not analyze their data properly. To make claims about higher-order factors, it is essentially to use structural equation modeling. Structural equation modeling shows that the Plasticity and Stability higher-order factors are not present in these data (i.e., the pattern of correlations is not consistent with this model) and it shows that only Openness is related to intelligence which can also be seen by just inspecting the correlation tables. Finally, the authors misinterpret the relationship between the general factor and self-rated intelligence. “First, their [high GFP individuals] intellectual self-confidence might be partly rooted in their actual cognitive ability as SAI and g shared some variance in explaining Plasticity and the GFP” (p. 4). This is pure nonsense. As is clearly visible in Figure 1, the general factor is not related to scores on Cattell’s test and as a result it cannot be related to the shared variance between test scores and self-rated intelligence that is reflected in the i factor in Figure 1. There is no path linking the i-factor with the general factor (h). Thus, individuals standing on the h-factor is independent of their actual intelligence. A much simpler interpretation of the results is that self-rated intelligence is influenced by two independent factors. One is rooted in accurate self-knowledge and correlates with objective test scores and the other is rooted in overly positive ratings on desirable traits and is related to the tendency to do so across all traits. Although this plausible interpretation of the results is based on a published theory of personality self-ratings (Anusic et al., 2009), the authors simply ignore it. This is bad science, especially in correlational research that requires testing of alternative models.
In conclusion, I was able to use the authors data to support an alternative theory that they deliberately ignored because it challenges the authors’ prior beliefs. There is no evidence for a General Factor of Personality that gives some people a desirable personality and others an undesirable one. Instead, some individuals exaggerate their positive attributes in self-reports. Even if this positive bias (self-enhancement) were beneficial, it is conceptually different from actually possessing these attributes. Being intelligent is not the same as thinking that one is intelligent, and thinking that one understands personality factors is different from actually understanding personality factors. I am not the first critic of personality psychologists’ lack of clear thinking about factors (Borsboom, 2006).
“In the case of PCA, the causal relation is moreover rather uninteresting; principal component scores are “caused” by their indicators in much the same way that sumscores are “caused” by item scores. Clearly, there is no conceivable way in which the Big Five could cause subtest scores on personality tests (or anything else, for that matter), unless they were in fact not principal components, but belonged to a more interesting species of theoretical entities; for instance, latent variables. Testing the hypothesis that the personality traits in question are causal determinants of personality test scores thus, at a minimum, requires the specification of a reflective latent variable model (Edwards & Bagozzi, 2000). A good example would be a Confirmatory Factor Analysis (CFA) model.”
In short, if you want to talk about personality factors, you need to use CFA and examine the properties of latent variables. It is really hard to understand why personality psychologists do not use this statistical tool when most of their theories are about factors as causes of behavior. Borsboom (2006) proposed that personality psychologists dislike CFA because it can disprove theories and psychologists seem to have an unhealthy addiction to confirmation bias. Doing research to find evidence for one’s beliefs may feel good and may even lead to success, but it is not science. Here I show that Plasticity and Stability do not exist in a data-set and the authors do not notice this because they treat sumscores as if they were factors. Of course, we can average Extraversion and Openness and call this average Plasticity, but this average is not a factor. To study factors, it is necessary to specify a reflective measurement model, and there is a risk that a model may not fit the data. Rather than avoiding this outcome, it should be celebrated because falsification is the root of scientific progress. Maybe the lack of theoretical progress in personality psychology can be attributed to an avoidance to disconfirm existing theories.
The University of Twente’s Honours programme has an enormous added value for the professional development of the participating students. That is the picture that emerges from the recent peer review of this UT excellence programme. A committee made up of students, honours deans and education staff from various universities recently took a close look at the UT programme, and presented its results this week.
In this blog post (pre-print), I examine the construct validity of the Elementary Psychopathy Assessment Super-Short Format scale (EPA-SSF) with Rose et al.’s (2022) open data. I examine construct validity by means of structural equation modeling. I find that the proposed 3-factor structure does not fit the data and find support for a four-factor structure. I also find evidence for a fifth factor that reflects a tendency to endorse desirable traits more and undesirable traits less. I find that most of the reliable variance in the scale scores is predicted by this factor, whereas substantive traits play a small role. I also show that the general factor contributes to the prediction of self-reported criminal behaviors. I find no evidence to support the inclusion of Emotional Stability in the definition of psychoticism. Finally, I raise theoretical objections about the use of sum scores to measure multi-trait constructs. Based on these concerns, I argue that the EPA-SSF is not a valid measure of psychoticism and that results based on this measure do not add to the creation of a nomological net surrounding the construct of psychoticism.
Measurement combines invention and discovery. The invention of microscopes made it possible to see germs and to discovery the causes of many diseases. Turning a microscope to the skies allowed Galileo to make new astronomical discoveries. In the 20th century, psychology emerged as a scientific discipline and the history of psychology is marked by the development of psychological measures. Nowadays, psychological measurement is called psychometrics. Unfortunately, psychometrics is not a basic, fundamental part of mainstream psychological science. Instead, psychometrics is mostly taught in education departments and used for applied purposes of educational testing. As a result, many psychologists who use measures in their research have very little understanding of psychological measurement.
For any measure to be able to discover new things, it has to be valid. That is, the numbers that are produced by a measure should reflect mostly variation in the actual objects that are being examined. Science progresses when new measures are invented that can produce more accurate, detailed, and valid information about the objects that are being studied. For example, developments in technology have created powerful microscopes and telescopes that can measure small objects in nanometers and galaxies billions of lightyears away. In contrast, psychological measures are more like kaleidoscopes. They show pretty images, but these images are not a reflection of actual objects in the real world. While this criticism may be harsh, it is easily supported by the simple fact that psychologists do not quantify validity of their measures and that there are often multiple measures that claim to measure the same construct even though they are only moderately correlated. For example, at least eight different measures claim to be measures of narcissism without a clear definition of narcissism and without validity information that makes it possible to pick the best measure of narcissism (Schimmack, 2022).
A fundamental problem in psychological science is the way scientific findings are produced. Typically, a researcher has an idea, conducts a study, and then publishes results if the results support their initial ideas. This bias is easily demonstrated by the fact that 95% of articles in psychology journals are supportive of researchers’ ideas, which is an unrealistically high success rate (Sterling, 1959; Sterling et al., 1995). Journals are also reluctant to publish work that is critical of previous articles, especially if these articles are highly cited, and authors are often asked to be expert reviewers of work that is critical of their work. It would take extra-human strength to be impartial in these reviews, and these self-serving reviews are often the death of critical work. Thus, psychological science lacks the basic mechanism that drives scientific progress: falsification of bad theories or learning from errors. Evidence for the lack of self-correction that is a necessary element of science was produced during the past decade that was called the replication crisis, when researchers dared to publish replication failures of well-known findings. However, while the replication crisis has focused on empirical tests of hypotheses, criticism of psychological measures has remained relatively muted (Flake & Fried, 2020). It is time to use the same critical attitude that fueled the replication crisis and apply it to psychological measurement. I predict that many of the existing measures lack sufficient construct validity or are redundant with other measures. As a result, progress in psychological measurement would be marked by a consolidation of measures that is based on a comparison of measures’ construct validity. As one of my favorite psychologists once observed in a different context, in science “less is more” (Cohen, 1990), and this is also true for science. While cuckoo’s clocks are fun, they are not used for scientific measurement of time.
A very recent article reviewed the literature on psychopathy (Patrick, 2022). The article describes psychopathy as a combination of three personality traits.
A conceptual framework that is helpful for assimilating different theoretical perspectives and integrating findings across studies using different measures of psychopathy is the triarchic model (Patrick et al. 2009, Patrick & Drislane 2015b, Sellbom 2018). This model characterizes psychopathy in terms of three trait constructs that correspond to distinct symptom features of psychopathy but relate more clearly to biobehavioral systems and processes. These are (a) boldness, which encompasses social dominance, venturesomeness, and emotional resilience and connects with the biobehavioral process of threat sensitivity; (b) meanness, which entails low empathy, callousness, and aggressive manipulation of others and relates to biobehavioral systems for affiliation (social connectedness and caring); and (c) disinhibition, which involves boredom proneness, lack of restraint, irritability, and irresponsibility and relates to the biobehavioral process of inhibitory control. (p. 389).
This definition of psychopathy raises several questions about the relationship between boldness, meanness, and disinhibition and psychopathy that are important for valid measurement of psychopathy. First, it is clear that psychopathy is a formative construct. That is psychopathy is not a common cause of boldness, meanness, and disinhibition and the definition imposes no restrictions on the correlation among the three traits. Boldness could be positively or negatively correlated with meanness or they could be independent. In fact, models of normal personality would predict that these three dimensions are relatively independent because boldness is related to extraversion, meanness is related to low agreeableness and disinhibition is related to low conscientiousness and these three broader traits are independent. As a result, the definition of psychopathy as a combination of three relatively independent traits implies that psychopaths are characterized by high levels on all three traits. This definition raises questions about the combination of information about the three traits to produce a valid score that reflects psychopathy. However, in practice scores on these dimensions are often averaged without a clear rational for this scoring method.
Patrick’s (2022) review also points out that multiple measures aim to measure psychopathy with self-reports. “multiple scale sets exist for operationalizing biobehavioral traits corresponding to boldness, disinhibition, and meanness in the modality of self-report (denoted in Figure 3 by squares labeled with subscript-numbered S’s)” (p. 405). It is symptomatic for the lack of measurement theories that Patrick uses the term operationalize instead of measurement because psychometricians have rejected the notion of operational measurement over 50 years ago (Chronbach & Meehl, 1955). The problem with operationalism is that every measure is by definition a valid measure of a construct because the construct is essentially defined by the measurement instrument. Accordingly, a psychopathy measure is a valid measure of psychopathy and if different measures produce different scores, they simply measure different forms of psychopathy. However, few researchers would be willing to accept that their measure is just an arbitrary collection of items without a claim to measure something that exists independent of the measurement instrument. Yet, they also fail to provide evidence that their measure is a valid measure of psychopathy.
Here, I examine the construct validity of one self-report measure of psychopathy using the open data shared by the authors who used this measure, namely the 18-item short form of the Elementary Psychopathy Assessment (EPA. Lynam, Gaughan, Miller, Miller, Mullins-Sweatt, & Widiger, 2011; Collison, Miller, Gaughanc, Widiger, & Lynam, 2016). The data were provided by Rose, Crowe, Sharpe, Til, Lynam, & Miller, 2022).
Rose et al.’s description of the EPA is brief.
The EPA-SSF (Collison et al., 2016) yields a total psychopathy score (alpha = .70/.77) as well as scores for each of three subscales: Antagonism (alpha = 61/.72), Emotional Stability (alpha = .66/.65), and Disinhibition (alpha = .68/.71).
The description suggests that the measure aims to measure psychopathy as a combination of three traits, although boldness (high Extraversion) is replaced with Emotional Stability (Low Neuroticism).
Based on their empirical findings, Rose et al. (2022) conclude that two of the three traits predict the negative outcomes that are typically associated with psychopathy. “It is the ATM Antagonism and Impulsivity [Disinhibition] domains that are most responsible for psychopathy, narcissism, and Machiavellianism’s more problematic correlates – antisocial behavior, substance use, aggression, and risk taking” (p. 10). In contrast, emotional stability/boldness are actually beneficial. “Conversely, the Emotional Stability and Agency factors are more responsible for the more adaptive aspects including self-reported political and interpersonal skill” (p. 11).
This observation might be used to modify and construct of narcissism in an iterative process known as construct validation (Cronbach & Meehl, 1955). Accordingly, disconfirming evidence can be attributed to problems with a measure or problems with a construct. In the present case, the initial assumption appears to be that psychopaths have to be low in Neuroticism or bold to commit horrible crimes. Yet, the evidence suggests that there also can be neurotic psychopaths who are violent and may the cause of violence is a combination of high neuroticism (especially anger) and low conscientiousness (lack of impulse control). We might therefore limit the construct of psychopathy to low agreeableness and low conscientiousness, which would be consistent with some older models of psychopathy (van Kampen, 2009). Even this definition of psychopathy can be critically examined given the independence of these two traits. If the actual personality factors underlying anti-social behaviors are independent, we might want to focus on these independent causes. The term psychopath would be akin to the word girl that simply describes the combination of two independent traits; disagreeable and impulsive or young and female. The term psychopath does not add anything to the theoretical understanding of anti-social behaviors because it is defined as nothing more than being mean and impulsive.
Does the EPA-SSF measure Antagonism, Emotional Stability, Disinhibition
The EPA was based on the assumption that Psychotocism is related to 18 specific personality traits and that these 18 traits are related to four of the Big Five dimensions. Empirical evidence supported this assumption. Five traits were related to low Neuroticism, namely Unconcerned, Self-Contentment, Self-Contentment, Self-Assurance, Impulsivity, and Invulnerability, and one was related to high Neuroticism (Anger). Evidently, a measure that combines items that reflect the high and low pole of factor is not a good measure of the factor. Another problem is that several of these scales had notable secondary loadings on other Big Five factors. Anger loaded more strongly and negatively on Agreeableness than on Neuroticism. and Self-Assurance loaded more highly on Extraversion. Thus, it is a problem to refer to the Emotional Stability scale as a measure of Emotional Stability. If the theoretical model assumes that Emotional Stability is a component of Psychoticism, it would be sufficient to use a validated measure of Emotional Stability to measure this component. Presumably, the choice of different items was motivated by the hypothesis that the specific item content of the EPA scales adds to the measurement of psychoticism. In this case, however, it is misleading to ignore this content in the description of the measure and to focus on the shared variance among items.
Another six items loaded negatively on Agreeableness, namely Distrust, Manipulation, Self-Centeredness, Opposition, Arrogance, and Callousness. The results showed that these six items were good indicators of Agreeableness. A minor problem is to call this scale antagonism, which is a common term among personality disorder researchers. It is also a general understanding that Antagonism and Agreeableness are strongly negatively correlated without any evidence of discriminant validity. Thus, it may be confusing to label this factor by a different name, when this name merely refers to the low end of Agreeableness (Disagreeableness). Aside from this terminological confusion, it is a question whether the specific item content of the Antagonism scale adds to the definition of psychoticism. For example, the item “I could make a living as a con artist” may not just be a measure of agreeableness, but also measure specific aspects of psychoticism.
Another three constructs were clearly related to low conscientiousness, namely Disobliged, Impersistence, and Rashness. A problem occurs when these constructs are measured with a single item because exploratory factor analysis may fail to identify factors that have only three indicators, especially when factors are not independent. Once again, calling this factor Disinhibition can create confusion if it is not stated clearly that Disinhibition is merely a label for low Conscientiousness.
Most surprising is the finding that the last three constructs were unrelated to the three factors that are supposed to be captured with the EPA. Coldness was related to low Extraversion and low Agreeableness. Dominance was related to high Extraversion and low Agreeableness. Finally, Thrill-Seeking had low loadings on all Big Five factors. It is not clear why these items would be retained in a measure of psychoticism unless it is assumed that the specific content of these scales adds to the measurement and therefore the operational definition of psychoticism.
In conclusion, the EPA is based on a theory that psychoticism is a multi-dimensional construct that reflects the influence of 18 narrow personality traits. Although these narrow traits are not independent and are related to four of the Big Five factors, the EPA psychoticism scale is not identical to a measure that combines Emotional Stability, low agreeableness, and Low Conscientiousness.
Lynam et al. (2011) also examined how the 18 scales of the EPA are related to other measures of anti-social behaviors. Most notable, all of the low Neuroticism scales showed no relationship with anti-social behavior. The only Neuroticism-related scale that was a predictor was Anger, but Anger not only reflects high Neuroticism, but also low Agreeableness. These results raise questions about the inclusion of Emotional Stability in the definition of Psychoticism. Yet, the authors conclude “overall, the EPA appears to be a promising new instrument for assessing the smaller, basic units of personality that have proven to be important to the construct of psychopathy across a variety of epistemological approaches” (p. 122). It is unclear what evidence could have changed the authors mind that their newly created measure is not a valid measure of psychoticism or that their initial speculation about the components of psychoticism was wrong. The use of an 18-item scale in 2022 shows that the authors have never found evidence to revise their theory of psychoticism or improved the measure of psychoticism. This is therefore important to critically examine the construct validity of the EPA from an independent perspective. I focus on the 18-item EPA-SSF because this scale was used by Rose et al. (2022) and I was able to use their open data.
Collins et al. (2016) conducted exploratory factor analyses with Promax rotation to examine the factor structure of the 18-item EPA-SSF. Although Lynam et al. (2011) demonstrated that items were related to four of the Big Five dimensions, they favored a three-factor solution. The problem of exploratory analysis is that they provide no evidence of the fit of a model to the data. Another problem is that factor solutions are atheoretical and influenced by item selection and arbitrary rotations. This might explain why the factor solution did not identify the expected factors. I conducted a replication of Collins’s EFAs with Rose et al.’s (2022) data from Study 1 and Study 2. I conducted these analyses in MPLUS, which provides fit indices that can be used to evaluate the fit of a model to the data. I used the Geomin rotation because this default method produces more fit indices and the corresponding fit index (RMSEA) is the same. Evidently, this choice of a rotation method has no influence on the validity of the results because neither of these rotation methods is based on substantive theory about Psychoticism.
The results are consistent across the two datasets. RMSEA and CFI favor 5-factors, while the criterion that favors parsimony the most, BIC, favors 4 factors. A three-factor model does not have bad fit, but it does fail to capture some of the structure in the data.
To examine the actual factor structure. I first replicated Collins et al.’s EFA using a three-factor structure and Promax rotation. Factor loadings greater than .4 (16% explained variance) are highlighted. The results show that the disinhibition factor is clearly identified and all five items have notable (> .4) loadings on this factor. In contrast, only three items (Coldness, Callous, & Self-Centered) have consistent loadings on the Antagonism factor. The Emotional Stability factor is not identified in the first replication sample because factor 3 shows high loadings for the Extraversion items. The variability of factor loading patterns across datasets may be caused by the arbitrary rotation of factors.
It is unclear why the authors did not use Confirmatory Factor Analysis to test their a priori theory that the 18 items represent different facets of Big Five factors. Rather than relying on arbitrary statistical criteria, CFA makes it possible to examine whether the pattern of correlation is consistent with a substantive theory. Using Collins et al.’s correlations with the Big Five, I fitted a CFA model with four factors to the data. The loading pattern was specified based on Lynam et al.’s (2011) pattern of correlations with a Big Five measure. Correlations greater than .3 were used to allow for a free parameter.
Fit of this model did not meet standard criteria of acceptable model fit (CFI > .95, RMSEA < .06), but it was not terrible, CFI = .728, RMSEA = .088. 29 of the 33 free parameters were statistically significant at p < .05 and many were significant at p < .001. 20 of the coefficients were greater than .3. It is expected that effect sizes are a bit smaller because the indicators were single items and replication studies are expected to show some regression to the mean due to selection effects. Overall, these results show similarity between Lynam et al.’s (2011) results and the pattern of correlations in the replication study.
The next step was to build a revised model to improve fit. The first step was to add a general evaluative factor to the model. Numerous studies of self-ratings of the Big Five and personality disorder instruments have demonstrated the presence of this factor. Adding a general evaluative factor to the model improved model fit, but it remained below standard criteria of acceptable model fit, CFI = .790, RMSEA = .078.
I then added additional parameters that were suggested by large modification indices. First, I added a loading for Impersistence on Extraversion. This loading was just below the arbitrary cut-off value of 30 in Lynam et al.’s study (r = .29). Another suggested parameter was a loading of Invulnerability on Extraversion (Lynam r = .18). A third parameter was a negative loading of Self-Assurance on Agreeableness. This loading was r = .00 in Lynam et al.’s (2011) study, but this could be due to the failure to control for evaluative bias that inflates rates on Self-Assurance and Agreeableness items (Anusic et al., 2009). Another suggested parameter was a positive loading of Opposition on Neuroticism (Lynam r = .18). These modifications improved model fit, but were not sufficient to achieve an acceptable RMSEA value, CFI = .852, RMSEA = .066. I did not add additional parameters to avoid overfitting the model.
The next step was to fit these two models to Rose et al.’s second dataset. The direct replication of Lynam et al.’s (2011) structure did not fit the data well, CFI = .744, RMSEA = .090, whereas fit of the modified model with the general factor was even better than in Study 1, CFI = .886, RSMEA = .062, and RMSEA was close to the criterion for acceptable fit (.060). These results show that I did not overfit the data. I tried further improvements, but suggested parameters were not consistent across the two datasets.
In the final step, I deleted free parameters that were not significant in both datasets. Surprisingly, the Disobliged and Impersistence items did not load on Conscientiousness. This suggests some problems with these single item indicators rather than a conceptual problem because these constructs have been related to Conscientiousness in many studies. Self-contentment did not load on Conscientiousness either. Distrust was not related to Extraversion, and Disobliged was not related to Neuroticism. I then fitted this revised model to the combined dataset. This model had acceptable fit based on RMSEA, CFI = .887, RMSEA = 058.
This final model captures the main structure of the correlations among the 18 EPA-SSF items and is consistent with Lynam et al.’s (2011) investigation of the structure by correlating EPA scales with a Big Five measure. It is also consistent with measurement models that show a general evaluative factor in self-ratings. Thus, I am proposing this model as the first validated measurement model of the EPA-SSF. This does not mean that it is the best model, but critics have to present a plausible model that fits the data as well or better. It is not possible to criticize the use of CFA because CFA is the only method to evaluate measurement models. Exploratory factor analysis cannot confirm or disconfirm theoretical models because EFA relies on arbitrary statistical rules that are not rooted in substantive theories. As I showed, EFA led to the proposal of a three-factor model that has poor fit to the data. In contrast, CFA confirmed that the 18 EPA-SSF items are related to four of the Big Five scales. Thus, four – not three – factors are needed to describe the pattern of correlations among the 18 items. I also showed the presence of a general evaluative factor that is common to self-reports of personality. This factor is often ignored in EFA models that rotate factors.
After establishing a plausible measurement model for the EPA-SSF, it is possible to link the factors to the scale scores that are assumed to measure Psychoticism, using the model indirect function. The results showed that the general factor explained most of the variance in the scale scores, r = .82, r^2 = 67%. Agreeableness/Antagonism explained only r = -.17, r^2 = 3% of the variance. This is a surprisingly low percentage given the general assumption that antagonism is a core personality predictor of anti-social behaviors. Conscientiousness/Disinhibition was a stronger predictor, but also explained less than 10% of the variance, r = -.300, r^2 = 9%. The contribution of Neuroticism and Extraversion was negligible. Thus, the remaining variance reflects random measurement error and unique item content. In short, these results raise concerns about the ability of the EPA-SSF to measure psychoticism rather than a general factor that is related to many personality disorders or may just reflect method variance in self-ratings.
I next examined predictive validity by adding measures of non-violent and violent criminal behaviors. The first model used the EPA-SSF scale to predict the shared variance of non-violent and violent crime based on the assumption that psychopathy is related to both types of criminal behaviors. The fit of this model was slightly better than the fit of the model without the crime variables, CFI = .789 vs. .854, RMSEA = .059 vs. .064. In this model, the EPA-SSF scale was a strong predictor of the crime factor, r = .56, r^2 = 32%. I then fitted a model that used the factors as predictors of crime. This model had slightly better fit than the model that used the EPA-SSF scale as predictor of time, CFI = .796 vs. .789, RMSEA = .059 vs. 059. Most importantly, neuroticism and extraversion were not significant predictors of crime, but the general factor was. I deleted the parameters for neuroticism and extraversion from the model. This further increased model fit, CFI = .802, RMSEA = .058. More important, the three factors explained more variance in the crime factor than the EPA-SSF scale, R = .70, R^2 = 49%. There were no major modification indices suggesting that unique variance of the items contributed to the prediction of crime. Nevertheless, I examined a model that only used the general factor as predictor and added items if they explained additional variance in the crime factor akin to stepwise regression. This model selected four specific items and explained 44% of the variance. The items were Manipulativeness (“I could have my life as a con-artist), b = .22, Self-Centeredness (“I have more important things to worry about than other people’s feelings”), b = .25, and thrill-seeking (“I like doing things that are risky or dangerous”), b = .39.
Dimensional Models of Psychopathy
The EPA-SSF is a dimensional measure of psychopathy. Accordingly, higher scores on the EPA-SSF scale reflect more severe levels of psychopathy. Dimensional models have the advantage that they do not require validation of some threshold that distinguishes normal personality variation from pathological variation. However, this advantage comes with the disadvantage that there is no clear distinction between low agreeableness (normal & healthy) and psychopathy (abnormal & unhealthy). Another problem is that the multi-dimensional nature of psychopathy makes it difficult to assess psychopathy. To illustrate, I focus on the key components of psychopathy, namely antagonism (disagreeableness) and disinhibition (low conscientiousness). One possible way to define psychopathy in relationship to these two components would be to define psychopathy as being high on both dimensions. Another one would be to define it with an either/or rule, assuming that each dimension alone may be pathological. A third option is to create an average, but this definition has the problem that the average of two independent dimensions no longer captures all of the information about the components. As a result, the average will be a weaker predictor of actual behavior. This is a problem of sum score definitions such as socio-economic status that averages income and education and reduces the amount of variance that can be explained by income and education independently.
One way to test the definition of psychopathy as being high in antagonism and disinhibition is to examine whether the two factors interact in the prediction of criminal behaviors. Accordingly, crimes are most likely to be committed by individuals who are both antagonistic and disinhibited, whereas each dimension alone is only a weak predictor of crime. I fitted a model with an interaction term as predictor of the crime factor. The interaction effect was not significant, b = .04, se = .14, p = .751. Thus, there is presently no justification to define psychopathy as a combination of antagonism and disinhibition. Thus, psychoticism appears to be better defined as being either antagonistic or disinhibited to such an extent that individuals engage in criminal or other harmful behaviors. Yet, this definition does not really add anything to our understanding of personality and criminal behavior. It is like the term infection that may refer to a viral or bacterial infection.
The Big Five Facets and Criminal Behavior
Investigation of the construct validity of the EPA-SSF showed that the 18-items reflect four of the Big Five dimensions and that two of the Big Five factors predicted criminal behavior. However, the 18-items are poor indicators of the Big Five factors. Fortunately, Rose et al. (2022) also included a 120 -item Big Five measure that also measures 30 Big Five facets (4 items per scale). It is therefore possible to examine the personality predictors of criminal behaviors with a better instrument to measure personality. To do so, I first fitted a measurement model to the 30 facet scales. This model was informed by previous CFA analyses of the 30 facets. Most importantly, the model included a general evaluative factor that was independent of the Big Five factors. I then added the items about nonviolent and violent crime and created a factor for the shared variance. Finally, I added the three EPA-SSF items that appeared to predict variance in the crime factor. I also related these items to the facets that predicted variance in these items. The final model had acceptable fit according to the RMSEA criterion (< .006), RMSEA = .043, but not the CFI criterion (> .95), CFI = .874, but I was not able to find meaningful ways to improve model fit.
The personality predictors accounted for 61% of the variance in the crime factor. This is more variance than the EPA-SSF factors explained. The strongest predictor was the general evaluative or halo factor, b = -.49, r^2 = 24%. Surprisingly, the second strongest predictor was the Intellect facet of Openness and the relationship was positive, b = .41, r^2 = .17%. More expected was a significant contribution of the Compliance facet of Agreeableness, b = -.31, r^2 = 9%. Finally, the unique variance in the three EPA-SSF items (controlling for evaluative bias and variance explained by the 30 facets) added another 10% explained variance, r = .312.
These results further confirm that Emotional Stability is not a predictor of crime, suggesting that it should not be included in the definition of psychopathy. These results also raise questions about the importance of disinhibition. Surprisingly, conscientiousness was not a notable predictor of crime. It is also notable that Agreeableness is only indirectly related to crime. Only the Compliance facet was a significant predictor. This means that disagreeableness is only problematic in combination with other unidentified factors that make disagreeable people non-compliant. As a result, it is problematic to treat the broader agreeableness/antagonism factor as a disorder. Similarly, all murders are human, but we would not consider being human a pathology.
Concerns about the validity of psychological measures led to the creation of a taskforce to establish scientific criteria of construct validity (Cronbach & Meehl, 1955). The key recommendation was to evaluate construct validity within a nomological net. A nomological net aims to explain a set of empirical findings related to a measure in terms of a theory that predicts these relationships. Psychometricians developed structural equation modeling (SEM) that make it possible to test nomological nets. Here, I used structural equation modeling to examine the construct validity of the Elemental Psychopathy Assessment – Super Short Form scale.
My examination of the psychometric properties of this scale raise serious questions about its construct validity. The first problem is that the scale was developed without a clear definition of psychopathy. The measure is based on the hypothesis that psychoticism is related to 18 distinct, maladaptive personality traits (Lynam et al., 2011). This initial assumption could have led to a program of validation research that could have suggested revisions to this theory. Maybe some traits were missing or unnecessary. However, the measure and its short-form have not been revised. This could mean that Lynam et al. (2011) discovered the nature of psychopathy in a strike of genius or that Lynam et al. failed to test the construct validity of the EPA. My analyses suggest the latter. Most importantly, I showed that there is no evidence to include Emotional Stability in the definition and measurement of psychopathy.
I am not the first to point out this problem of the EPA. Collins et al. (2016) discuss the inclusion of Emotional Stability in a definition of psychopathy at length.
it may seem counter-intuitive that Emotional Stability would be included as a factor of the super-short form (and other EPA forms). We do not believe its inclusion is inconsistent with our previous positions as the EPA was developed, in part, from clinical expert ratings of personality traits of the prototypical psychopath. Given that many of these ratings came from proponents of the idea that FD is a central component of psychopathy, it is natural that traits resembling FD, or emotional stability, would be present in the obtained profiles. While we present Emotional Stability as a factor of the EPA measure, however, we do not claim Emotional Stability to be a central feature of psychopathy. Its relatively weak relations to other measures of psychopathy and external criteria traditionally related with psychopathy support this argument (Gartner, Douglas, & Hart, 2016; Vize, Lynam, Lamkin, Miller, & Pardini, 2016).” (p. 2016).
Yet, Rose et al. (2022) treat the EPA-SSF as if it is a valid measure of psychopathy and make numerous theoretical claims that rely on the assumption that the EPA-SSF is a valid measure of psychopathy. It is of course possible to define psychopathy in terms of low neuroticism, but it should be made clear that this definition is stipulative and cannot be empirically tested. The construct that being measured is an artifact that is created by the researchers. While neuroticism is a construct that describes something in the real world (some people are more anxious than others), psychoticism is merely a list of traits. Some people may want to include psychoticism on the list and others may not. The only problem is when term psychoticism is used for different lists. The EPA scale is best understood as a measure of 18 traits. We may call this psycoticism-18 to distinguish it from other constructs and measures of psychoticism.
The list definition of psychological constructs creates serious problems for the measurement of these constructs because list theories imply that a construct can be defined in terms of its necessary and sufficient components. Accordingly, a psychopath could be somebody who is high in Emotional Stability, low in Agreeableness, and low in Conscientiousness, or somebody who possess all of the specific traits included in the definition of Psychoticism-18. However, traits are continuous constructs and it is not clear how individual profiles should be related to quantitative variation in psychoticism. Lynam et al. sidestepped this problem by simply averaging across the profile scores and to treat this sum score as a measure of psychoticism. However, averaging results in a loss of information and the sum score depends on the correlations among the traits. This is not a problem when the intended construct is the factor that produces the correlations, but it is a problem when the construct is the profile of trait scores. As I showed, the sum score of the 18 EPA-SSF items mainly reflects information about the variance that is shared among all 18 items, which reflects a general evaluative factor. This general factor is not mentioned by Lynam et al. and is clearly not the intended construct that the EPA-SSF was intended to measure. Thus, even if psychopathy were defined in terms of 18 specific traits, the EPA-SSF sum score does not provide information about psychopathy because the actual information that the items were supposed to be measured is destroyed by averaging them.
In conclusion, I am not an expert on personality disorders or psychopathy. I don’t know what psychopathy is. However, I am an expert on psychological measurement and I am able to evaluate construct validity based on the evidence that authors’ of psychological measures provide. My examination of the construct validity of the EPA-SSF using the authors own data makes it clear that the EPA-SSF lacks construct validity. Even if we follow the authors proposal that psychopathy can be defined in terms of 18 specific traits, the EPA-SSF sum score fails does not capture the theoretical construct. If you would take this test and get a high score, it doesn’t mean you are a psychopath. More importantly, research findings based on this measure do not help us to explore the nomological network of psychopathy.
Last week, PLOS ONE a new Curated Collection – Recent Advances in Understanding Plastic Pollution. In this second installment of our Q&A with authors from this collection, we speak with author groups who study consumer knowledge and attitudes toward plastic products and the ease of recycling.
Emma Berry, Lecturer, Queen’s University Belfast
Emma Berry is a Health Psychology Lecturer in the School of Psychology at Queen’s University Belfast. Emma’s research interests include psychological adjustment to long-term conditions, health and environmental behaviour change, and psychosocial and behavioural intervention development. Emma is also interested in creative modes of communicating information and providing education, particularly in the format of comics.
Emma Berry’s paper in this Curated Collection: Roy D, Berry E, Dempster M (2022) “If it is not made easy for me, I will just not bother”. A qualitative exploration of the barriers and facilitators to recycling plastics. PLoS ONE 17(5): e0267284. https://doi.org/10.1371/journal.pone.0267284
PLOS: You carried out a study to investigate motivations and barriers to recycling plastics, and the title of your paper is quite telling – it needs to be easy for people to recycle. Was there anything about the results of this study that surprised you?
EB: A novel element of this study was to qualitatively explore how the dexterity of plastic packaging can influence recycling behaviour. It was interesting to find that, in spite of environmental concern, participants openly recognised that the complexity of recycling, which is influenced by both the packaging and the accessibility of recycling resources i.e. bins, is an important barrier to recycling behaviour. Even when people are motivated to recycle, this does not always translate into action. Moreover, experiencing environmental concern does not necessarily make recycling a priority. For many people recycling is one of many competing life priorities, so if it requires too much cognitive and/or physical effort, other competing behaviours will take precedent. Of relevance to plastic manufacturers and retailers, our study reaffirms the usefulness of simplicity in the design of plastic packaging, with clear visual cues to aid decisions about what, how, where, and when to recycle.
PLOS: It is mentioned in the paper that some of the original intentions on how the data was to be used changed. Can you elaborate on how some of these changes occurred? Sometimes it can feel like a lot of pressure for research to always work out like we hoped or planned, so it is nice to hear how things can be adapted or altered for various scenarios during an ongoing study.
EB: The value of qualitative designs is that we can adopt an inductive or bottom-up approach, enabling us to be more receptive of new and unexpected findings. This also means that we can be more flexible (within the realms of the research question) about how the data is interpreted and used, depending on the emergent themes. The decision to integrate the survey data was post-hoc, based on the qualitative themes extracted. The survey work was conducted separately and was intended to provide an overview of recycling awareness, knowledge, and behaviours in a cross-section of people living in Northern Ireland. However, following the analysis of the qualitative findings, we felt that the frequencies observed in the survey data corroborated the salience of themes relating to physical opportunity and motivational factors underpinning intentions to recycle.
PLOS: You chose to publish the peer review history of your paper online together with the paper itself. Can you tell us what motivated you to do this? Was there anything in particular about the peer review process or recommendations from the editors or reviewers that felt especially useful for enhancing the paper?
EB: Publishing the peer review history of the paper supports an open science approach and allows readers to acknowledge how the paper has evolved from the original submission. However, we also wanted to acknowledge the specific recommendations provided by peer reviewers. In particular, the helpful recommendations to improve the structure and reporting of the interview and survey findings, in order to strengthen the narrative and make the most of the data available. Moreover, the peer review process prompted us to clarify the theoretical framework applied to the methodology (the COM-B model), which is a novel and valuable element of the study. We felt it was important to acknowledge the value of the peer review process to reaffirm this.
PLOS: Two other studies in this collection also look at consumer attitudes to recycling and waste, and the use of bioplastics. These are “Chukwuone NA, Amaechina EC, Ifelunini IA (2022) Determinants of household’s waste disposal practices and willingness to participate in reducing the flow of plastics into the ocean: Evidence from coastal city of Lagos Nigeria. PLoS ONE 17(4): e0267739. https://doi.org/10.1371/journal.pone.0267739” and “Filho WL, Barbir J, Abubakar IR, Paço A, Stasiskiene Z, Hornbogen M, et al. (2022) Consumer attitudes and concerns with bioplastics use: An international study. PLoS ONE 17(4): e0266918. https://doi.org/10.1371/journal.pone.0266918” Has seeing these other research studies in the collection helped inspire any thoughts about future work you might do, or other advances your research community will make?
EB: Our paper, in conjunction with the two other studies in this collection support the need for research that focuses on the design and evaluation of interventions to support appropriate recycling behaviour and minimise inappropriate disposal of plastic waste. The paper by Filho et al. (2022) is interesting as it considers how plastic material can be altered to improve the ecological footprint of the production and degradation of packaging, and this resonates with a previous paper we collaborated on by Meta et al. (2021: https://doi.org/10.1016/j.spc.2020.12.015). All three papers collectively affirm the need to provide more behavioural scaffolding to assist recycling in day to day life. This means adjusting the choice architecture by focusing on the design of plastic packaging and the availability of cues and resources required to recycle more effortlessly.
Stay tuned for more interviews with authors from this collection.
“Join DORA for a community call to introduce two new responsible research evaluation tools and provide feedback on future tool development. The toolkit is part of Project TARA, which aims to identify, understand, and make visible the criteria and standards universities use to make hiring, promotion, and tenure decisions. This interactive call will explore these new tools, which were each created to help community members who are seeking:
Strategies on how to debias committees and deliberative processes: It is increasingly recognized that more diverse decision-making panels make better decisions. Learn how to debias your committees and decision-making processes with this one-page brief.
Ideas on how to incorporate a wider range of contributions in their evaluation policies and practices: Capturing scholarly “impact” often relies on familiar suspects like h-index, JIF, and citations, despite evidence that these indicators are narrow, often misleading, and generally insufficient to capture the full richness of scholarly work. Learn how to consider a wider breadth of contributions in assessing the value of academic activities with this one-page brief….”
At the end of 2021, the ZBW – Leibniz Information Centre for Economics launched the Open Science Retreat, a new online format to intensively discuss current and globally relevant Open Science topics in a small circle of international Open Science advocates. The outcome is completely open. The focus is on networking and exchange.
During the third retreat in June 2022, the participants discussed the topic “Impact of Global Crises on the Open Science Movement”. The past has shown that crises – such as the Corona pandemic – can surprisingly turn out as enablers on openness. On the other hand, Russia’s unprovoked and unjustified military aggression against Ukraine (#ScienceForUkraine) and the suffering and destruction it has caused, painfully remind us of the limiting factor crises can have on the openness of science. But how do such events affect the Open Science movement in general, and how does the Open Science community respond? These and other questions were discussed during the retreat. It quickly became apparent to the participants that there has been little such discourse and corresponding reflection so far.
Thus, some participants of the retreat wrote the open letter “Open Science should provide support, not impose sanctions” directed to the Open Science community in general. It focuses on two core theses:
The Open Science movement should address the question of whether and, if so, under which framework conditions “closeness” can be appropriate in global, political crises.
Openness must not be used to place sanctions in global, political crises by closing open offers.
The Open Letter takes a closer look at these aspects.
The aim of the open letter is to further stimulate the discourse and the corresponding reflection on the mentioned aspects. Thus, the authors explicitly invite the Open Science community to support this letter.
“There are lots of reasons why you, a middling academic, might want to edit or contribute to a collection of essays. These include pride, intellectual kudos or, in the UK, a need to boost your likely rating in the Research Excellence Framework (REF). The one thing you don’t do it for is the royalty cheque which is small or, more probably, non-existent.
On the other hand, at least accepting the invitation won’t cost you, except in time. Or will it? Increasingly, you would be wise to look carefully at the contract before you agree to it….
In the old days, contracts didn’t amount to much. You would probably guarantee originality and that, to the best of your knowledge, your work was not defamatory or illegal, but that was it. No longer, however. One publisher (I won’t name it, but it’s part of a major international conglomerate) insists on a contract stating that “the Author will indemnify and hold harmless the Publishers against any loss, damages, injury, costs and expenses (including any legal costs or expenses, and any compensation costs paid by the Publishers) arising from any alleged facts or circumstances which, if true, would constitute a breach of the warranty”.
Even if such verbiage makes your eyes glaze over, think carefully. You’re guaranteeing to pay from your own pocket, without limitation, for all the consequences to the publisher of any breach of copyright, libel or breach of privacy….:
“At sci2sci, we are building an electronic lab notebook and a publishing platform in one interface. This will allow to store all experimental data and metadata in one place, and quickly release it in public access with one click.
In a nutshell, we offer full stack data publishing – from the experiment planning through raw data acquisition and analysis to the final research report – all in a single platform, with a number of benefits that cannot be offered by a current journal pdf manuscript:…”