WASHINGTON - A new computer program can determine in most cases the sex of an author by detecting subtle differences in the words men and women prefer to use.
For instance, female writers tend to choose grammatical terms that apply to personal relationships, such as "for" and "with," more frequently than men do.
"Women have a more interactive style," said Shlomo Argamon, a computer scientist at the Illinois Institute of Technology in Chicago who developed the program. "They want to create a relationship between the writer and the reader."
Men use more numbers, adjectives and determiners - words such as "the," "this" and "that" - because they apparently care more than women do about conveying specific information.
Argamon said the intent of male writers often was to say: "Here's something I want to tell you about, and here are some things about it."
Women, he found, write the pronoun "she" more often than men do, although both sexes use "he" about equally.
Argamon said it wasn't clear what psychological or sociological differences between men and women might explain the differences in their writing styles. "It's a subject for further research," he said.
Other experts, such as Deborah Tannen, a linguistics professor at Georgetown University in Washington, have popularized the idea that men and women have different styles of communicating. But Argamon's work is the first to show such distinctions in writing.
"This is surprising since, unlike conversation, writing a book or an article does not involve direct social interaction," he said.
Argamon said his program correctly determined the sex of the author in 80 percent of the works it checked. One it missed was A.S. Byatt's best-selling novel Possession. The computer said it was written by a man; Byatt is a woman. Michael Frayn's science fiction tale A Landing on the Sun was misidentified as the work of a woman.
Argamon's gender program is part of a much broader technique called "stylometry," which analyzes styles not only of writing, but also of music, graphics, art and architecture.
A practical application of stylometry, he said, would be to identify writers of anonymous communications, such as the Unabomber, on the basis of their writings.
To carry out his project, Argamon and colleagues analyzed the texts of 566 British books and articles, fiction and nonfiction, taken from the huge computer database called British National Corpus.
From that mass of almost 20 million words, the computer program WINNOW extracted 1,081 distinctive "features," such as prepositions, pronouns and adjective phrases. It checked the use of verb forms such as "go" and "going." It also counted punctuation marks such as dashes and exclamation marks.
After running repeatedly through these features, the computer winnowed the list down to 128 significant contrasts. The results indicated that the words favored most heavily by men were what grammarians call determinative words such as "the," "a," "as," "that" and "one." Female writers favored "she" and relationship words such as "for," "with," "in," "and" and "not."
When Argamon then tested his program on other texts, it succeeded 80 percent of the time in identifying the sex of an anonymous writer.
Argamon and fellow researchers Moshe Koppel and Anat Shimoni published a report on their work in the April edition of the journal Literary and Linguistic Computing.