Sunday, 18 July 2010

Who do you write like?

I pasted a longish blog post into I Write Like, and it said:

I write like
George Orwell

I Write Like by Mémoires, Mac journal software. Analyze your writing!


While I appreciate the compliment, I wish it would be more specific as to how it got that assessment. I can make a few guesses.

It seems obvious that this uses some kind of nearest-neighbour search. Take a corpus of authors, break their works into good-sized chunks, and then find the closest match for whatever the user gives you.

But what constitutes a match? We could use n-grams (words, and strings of words), as we do in many computational language tasks, but just matching the words in a book doesn't mean you write like the author. Sure, Steinbeck and Faulkner wrote different words in their books just because of the topics they treated, but that's not what we mean by writing style.

My guess is that writing style is more about patterns of words, especially function words like prepositions and conjunctions. (You may have noticed I start a lot of sentences with conjunctions like 'but' and 'and'.) I'd try running all the words through a part-of-speech tagger, and see what matches that data best. Just a guess though.

I wonder if Orwell writes like Orwell. Here are three adjacent passages from Orwell's Down and Out in Paris and London, with the computer's assessment.

Or there was Henri, who worked in the sewers. He was a tall, melancholy man with curly hair, rather romantic-looking in his long, sewer-man’s boots. Henri’s peculiarity was that he did not speak, except for the purposes of work, literally for days together. Only a year before he had been a chauffeur in good employ and saving money. One day he fell in love, and when the girl refused him he lost his temper and kicked her. On being kicked the girl fell desperately in love with Henri, and for a fortnight they lived together and spent a thousand francs of Henri’s money. Then the girl was unfaithful; Henri planted a knife in her upper arm and was sent to prison for six months. As soon as she had been stabbed the girl fell more in love with Henri than ever, and the two made up their quarrel and agreed that when Henri came out of jail he should buy a taxi and they would marry and settle down. But a fortnight later the girl was unfaithful again, and when Henri came out she was with child, Henri did not stab her again. He drew out all his savings and went on a drinking-bout that ended in another month’s imprisonment; after that he went to work in the sewers. Nothing would induce Henri to talk. If you asked him why he worked in the sewers he never answered, but simply crossed his wrists to signify handcuffs, and jerked his head southward, towards the prison. Bad luck seemed to have turned him half-witted in a single day.

I write like
H. P. Lovecraft

I Write Like by Mémoires, Mac journal software. Analyze your writing!


Or there was R., an Englishman, who lived six months of the year in Putney with his parents and six months in France. During his time in France he drank four litres of wine a day, and six litres on Saturdays; he had once travelled as far as the Azores, because the wine there is cheaper than anywhere in Europe. He was a gentle, domesticated creature, never rowdy or quarrelsome, and never sober. He would lie in bed till midday, and from then till midnight he was in his comer of the bistro, quietly and methodically soaking. While he soaked he talked, in a refined, womanish voice, about antique furniture. Except myself, R. was the only Englishman in the quarter.

I write like
Charles Dickens

I Write Like by Mémoires, Mac journal software. Analyze your writing!


There were plenty of other people who lived lives just as eccentric as these: Monsieur Jules, the Roumanian, who had a glass eye and would not admit it, Furex the Liniousin stonemason, Roucolle the miser — he died before my time, though — old Laurent the rag-merchant, who used to copy his signature from a slip of paper he carried in his pocket. It would be fun to write some of their biographies, if one had time. I am trying to describe the people in our quarter, not for the mere curiosity, but because they are all part of the story. Poverty is what I am writing about, and I had my first contact with poverty in this slum. The slum, with its dirt and its queer lives, was first an object-lesson in poverty, and then the background of my own experiences. It is for that reason that I try to give some idea of what life was like there.

I Write Like by Mémoires, Mac journal software. Analyze your writing!


No wonder Orwell had writer's block: schizophrenia.

UPDATE: Thanks to Kuri for that link in comments. It seems the author used
vocabulary (use of words), number of words, commas, and semicolons in sentences, number of sentences with quotation marks and dashes (direct speech).
I'd say this could be smartened up considerably. Just including some simple features would help, like the ratio of singletons (words appearing once) to other words, appearance of conjunctions, or ranking all the words by frequency and comparing lists.

This kind of makes me want to try building a better system. I won't (for lack of time), but I think I will keep in mind that if you can take interesting work in natural language processing and make a simple web implementation, people will think it is interesting. You can also have a lot of English major hotheads sniping at you because you snubbed Toni Morrison. Wouldn't that be fun!

 

11 comments:

  1. I got James Joyce which I was a bit chuffed about since he's Irish and I like to claim upon my Irish heritage whenever I can ;)

    ReplyDelete
  2. You might find this blog post interesting. (And scroll down for two follow-ups.) The creator of the site appears to be deliberately excluding non-white authors from his corpus.

    ReplyDelete
  3. I apparently write like Dan Brown. Also, I Actually Write Like confirms that diagnosis. I fear it is terminal...

    ReplyDelete
  4. I think this measure is pretty silly, but even though I got a few writers, I most consistently got David Foster Wallace.

    ReplyDelete
  5. Arrgh, Daniel, thanks for making me do this! I put some stuff I'd written into it and it said I write like STEPHENIE MYER.

    I'm guessing it's because it was content I'd written for my world of darkness roleplaying campaign, so the word vampire and werewolf was in there a few times

    ReplyDelete
  6. I got Dan Brown and then David Foster Wallace. I haven't read any of their stuff so I don't know whether that is a compliment or not, but maybe next time I'm struggling to choose at the library, one of these gents may help.

    ReplyDelete
  7. EoR I went to I Actually Write Like and got two different responses for the same text- First it said Dan Brown and a picture of a big brown steaming turd and the second time it said - a human, although not a human especially familiar with writing with the same pic. Good thing I have (other) skills!

    ReplyDelete
  8. I got Joseph Smith which I was suprised about.

    ReplyDelete
  9. LOL, lay off the "and it came to pass"es...

    ReplyDelete
  10. I doubt they do any sane comparisons. See this analysis. Basically you're just giving a link to increase some SEOer's pagerank. Say No to Commercial Memes (or at least add a rel="nofollow" to the link).

    I suspect you could be just as accurate by taking the last few bits of an MD5 as an index into a lookup table.

    ReplyDelete
  11. Well I triggered H. P. Lovecraft.
    If that brings me any closer to Orwell I'm happy.

    ReplyDelete

Thanks for commenting! If this comment is on a post older than 60 days, your comment will go straight to moderation, and I'll approve it if it's not spammy.