Masculmity/Femininity Scales

The most popular way of grasping diversity in sexual character is to think of people as being arrayed along a dimension that represents differences in some gender-related trait. This is the basis of most paper-and-pencil tests of masculinity/femininity (hereafter M/F). The problem is how to specify where, on the imaginary line representing the dimension, any given person’s shadow falls. This is called ‘measuring’ the trait, and M/F scales differ in the solutions they offer.

Some tests have used projective methods, such as the ‘IT’ scale which shows children a sexually ambiguous figure and invites them to make up a story about it. But since the 1940s, when the Minnesota Multiphasic Personality Inventory (MMPI) came into use, a different form has been usual. This most famous of all psychiatric screening tests contained a masculinity—femininity subscale, and set the pattern of a self-report inventory with many short items. The respondents describe themselves on each item separately, and a count is made of how often a certain kind of response is made. This count, sometimes transformed statistically, becomes the respondent’s femininity or masculinity score.

The form of the self-description varies a little. Janet Spence and Robert Helmreich in Masculinity and Femininity use a questionnaire which invites people to rate themselves on where they fall between two specified extremes, for example:

very rough і…… і very gentle

goes to pieces under pressure… stands up well under pressure

Sandra Bern, author of the famous ‘androgyny’ scale, dispenses with the polar opposites. Each item simply names a trait and asks respondents to rate how often it is true of them:

ambitious… forceful… affectionate… child-like…

How is it known that these scales measure femininity and masculinity? The usual test has been that each item discriminates statistically between women and men. As Anne Constantinople points out in her excellent review of M/F scales, this results in a systematic confusion between ‘sex difference’ and ‘masculinity/ femininity’. In principle any item that shows a sex difference can figure in M/F scales. In practice almost anything does. Items range from generalized self-descriptions like those quoted above, to job preferences, word associations, neurotic symptoms, information and aesthetic interests. Constantinople remarks that while some items reflect an intuitive notion of what ‘masculinity’ and ‘feminin­ity’ mean, in many cases the content seems ‘irrelevant to any identifiable definition of the concept’.

Once a point of departure has been established another psycho­metric criterion takes over, the scale’s internal consistency. Candi­date items are kept in the scale, or dropped, according to their correlation with other items. Spence and Helmreich’s scales illustrate this. The correlation of each item with the total (i. e., the sum of scores on all items) is presented as justification of the coherence of each trait and scale; and those items which have high item-total correlations are chosen for the ‘short form’ of the scale.

This criterion is likely to whittle down the heterogeneity of content, as it is familiar in questionnaire research that the highest correlations are between items which ask the same kind of question in slightly different language. More important, it eliminates any possibility of recognizing tension or incoherence within the trait being measured.

The reason for using a scale score is that individual items are not very reliable, and can only be presumed to carry a small drop of the trait being measured, mixed up with various impurities. By combining the answers to a number of questions, the impurities tend to cancel each other out. The way this combining is done is the key to the kind of psychology that quantitative personality research produces. In adding together the item scores, the specific meaning of each question is ignored. In old-fashioned hand-scoring of questionnaires it was usual to lay a cardboard sheet which masked the questions over the page, with the ticks or crosses of answers appearing through small windows cut in the cardboard. In modern machine-scoring only the number of the question is needed — its wording does not enter the machine or the processing at all. Semantics, in short, is abandoned.

Gutting answers of their particular meaning in order to treat them as partial measures of a dimensional whole is taken for granted in the psychometric literature. What a person thinks she is saying to the researcher is set aside. The tick or cross is treated not as an answer to a question but as a ‘response’ providing a clue to an underlying entity. The researcher knows about this entity but the ‘subject’ does not.

Nearly thirty years ago, in The Person in Psychology, Paul Lahtte mounted a sustained critique of what he called ‘substantive abstraction’ in psychological measurement. Other criticisms of the dubious assumptions and attenuation of reality in paper-and-pencil surveys have come from quarters as diverse as psychoanalysis, anthropology and ethnomethodology. The problem is not that the technique was rough in its early days. Recent research commonly takes less account of such problems than the pioneering studies did. The problem is about the bases of the method. Gender scaling, like other forms of personality and attitude scaling, involves a radical desemanticization of human practice. It is a case of what R. D. Laing in another context called ‘transpersonal invalidation’. The chances of a sound understanding of human beings coming out of such research are infinitesimal.

But the approach does have important ideological effects. Desemanticization allows research to recognize variation without having to deal with contradiction. If a person’s answers to related items conflict, this does not register as a problem, for instance a question of ambivalence. It simply lowers the total score. If some items do not fit with the others, there is no requirement to investigate why. They are simply dropped from the item pool as a normal step in producing the final instrument.

Femininity and masculinity are thus implicitly theorized as homogeneous dimensions of temperament, which can be measured in all people. In a roundabout way this allows scalar research to recognize a point that unitary conceptions of sexual character could not, the coexistence of masculinity and femininity in the same person. Not in the way Freud saw them, as desires and

identifications in conflict with each other, but through the notion of multiple dimensions of variation. Femininity and masculinity need not be treated as polar opposites, i. e., as ends of the same dimension. Each might be treated as a separate scale, and the same person might get high scores on both. This idea occurred to a number of American psychologists in the early 1970s, with Bern’s version, ‘androgyny’, gaining most attention. Spence and Helmreich performed a kind of reductio ad absurdum by constructing a femininity scale, a masculinity scale and a M/F scale and showing they were all statistically unconnected when administered to the same people.

The muddle in interpretation that follows from the desemanticiz — ation of human communications is obvious enough. Constantin­ople’s summing-up on masculinity and femininity scales a decade ago is still sound: ‘both theoretically and empirically they seem to be among the muddiest concepts in the psychologist’s vocabu­lary.’ The scalar approach has produced very little new understand­ing of the psychosocial processes involved.

The reasons for its popularity perhaps have more to do with the politics of academic psychology. Scalar M/F research offers a way of treating an important social question in terms acceptable to a psychological establishment very much concerned with scientificity, formal measurement and statistical proof. It is quick and straightforward, since the conventions of scaling are well established; the professional journals like it, and no one is threatened by it. At this level scalar research is part of the domestication of sexual politics in the name of science. It is an important political fact that a great deal of it has been done by women.

At a deeper level this kind of psychology participates in the process of reification. Turning a process, an action, or a relationship into an object, or treating it as if it were an object, is one of the fundamental dynamics of modern culture. Gender scaling involves a drastic reification of the process of self-expression or accounting for oneself. Even the stylized self-description called for by the scale items is converted, by the operations that produce ‘scores’ and then statistically manipulate them, into location in the abstract space of a personality dimension.

If this research has been popular, and people feel they recognize themselves in dimensional accounts (or in the unitary accounts of sexual character discussed previously), it is not because people are dots on a computer-generated graph in N dimensions. It is, perhaps, because the process of reification is so far advanced as to make recognition of qualitative diversity threatening. Fear, not of ‘otherness’ so much as of the riotous exuberance of motive and imagination that is a possibility in sexual life, can be a powerful motive in a world partly reified already.

Updated: 02.10.2015 — 00:33