Dead man and successful woman

I follow Humans of New York on Facebook, and I often skim the top comments on interesting posts. Today there was a story of a woman who works as a counsellor for HIV-positive teenagers after overcoming a lot of personal suffering. At the time, one of the top comments described her as ‘beautiful’. Given what I know of HONY commenters, the poster very likely meant she has a ‘beautiful soul’. But still, you wouldn’t call a man beautiful like that… or would you?

I used the portal to all of human knowledge (Google) to test a series of hypotheses. My resting assumption, given what I know of the world, of feminist writing, of the internet etc., is that certain qualities/words tend to be attributed more to men than to women (and vice versa). So I created bigrams (two-word phrases) from ‘adjective’ + ‘man’ or ‘woman’ and tested how many hits I got. I’ll put the raw data at the bottom of this post.

First of all, what is the unigram (single-word) prevalence of ‘man’ and ‘woman’?

unigram frequency

Not terribly surprising given ‘man’ also functions as ‘person’ in the English language. Given this, we might expect ‘adjective man’ to appear overall much more frequently than ‘adjective woman’, all else being equal. But all else is not equal. 

Adjectives left to right: angry, beautiful, brave, crying, dead, friendly, handsome, homeless, inspiring, intelligent, lonely, pregnant, sexy, smart, strong, successful

If I wanted to test for some association between ‘man/woman’ and ‘adjective’ I could also look for the unigram frequencies of ‘adjective’, but that’s not the objective here. That is to say, we could imagine these results give us an idea of bigram probabilities, and if the components of the bigram are independent, the resultant probability should be the product of the individual probabilities.

Immediately it appears that the original adjective of interest (‘beautiful’) really stands out for women, which corresponds to my existing hypothesis. Unsurprisingly, ‘pregnant woman’ and ‘handsome man’ are more common than their counterparts.

Since I’m more interested in comparisons, let’s look at the ratio of ‘adjective man’ to ‘adjective woman’, taking care to divide out the background frequencies of ‘man’ and ‘woman’ as we do so. Specifically, I take the counts of ‘adjective man’ and divide by ‘total man’, for each adjective, and do similarly for ‘adjective woman’. Then I can actually compare their counts, knowing the greater abundance of ‘man’ won’t disturb the results.

The dotted black line separates the point at which one favours 'man' over 'woman'.
The dotted black line separates the point at which one favours ‘man’ over ‘woman’.

This is more interpretable, although I could definitely have plotted this better (as deviations from 1, rather than growing from 0, as well as using a neutral shade for ratios close enough to one, but I want to eat a burrito soon). Once again ‘handsome’, ‘beautiful’ and ‘pregnant’ behave as expected.

‘Successful’ is the most woman-specific adjective from this set. This obviously does not imply that women are more successful than men (or men are more dead than women…), but it could suggest that we talk about the success of women more. It seems that we prefer most of these adjectives for women, in fact. I tried to go for words which might have some kind of subtle gender/sex bias (… and ‘pregnant’), but if it wasn’t for the very obvious ‘handsome’ and the potentially-weird ‘dead’, I didn’t do a very even job of it. My initial assumptions about bigram frequencies (I mean word usage) were clearly flawed.

A word of caution: without controlling for the background frequency of ‘man’ and ‘woman’, the previous graph looks like this: (a conscious observer may note that the only change is in the y-axis, which results from the normalisation scaling everything by a factor of ‘woman’/’man’ frequency.)


This might agree with what you expect to see (if you are as I am constantly inundated with strong, friendly, homeless, handsome crying dead men) but is based on a flawed analysis.

I’m not sure what to take from all of this. I would like to find adjectives which are more commonly associated with men and then draw wild conclusions about the nature of society, but that will have to wait for another day.

Data: (#: I excluded ‘lovely’ for being too obscure and ‘old’ for wrecking the axes on the graphs)

adjective count sex
#none 204000000 woman
#none 512000000 man
beautiful 3380000 woman
beautiful 772000 man
handsome 289000 woman
handsome 1630000 man
brave 710000 woman
brave 614000 man
strong 989000 woman
strong 1450000 man
inspiring 313000 woman
inspiring 216000 man
intelligent 1150000 woman
intelligent 398000 man
smart 642000 man
smart 1300000 woman
successful 360000 man
successful 2130000 woman
lonely 495000 man
lonely 1340000 woman
angry 847000 woman
angry 585000 man
sexy 748000 man
sexy 1550000 woman
#old 20400000 man
#old 5970000 woman
dead 459000 woman
dead 2660000 man
homeless 1180000 man
homeless 814000 woman
crying 343000 man
crying 339000 woman
pregnant 2630000 woman
pregnant 468000 man
#lovely 355000 man
#lovely 491000 woman
friendly 809000 man
friendly 313000 woman