Swapping words with their synonyms for fun

September 21, 2010

I was explaining some of the NLP stuff I’ve been working on to a teacher-friend of mine, and he suggested the idea of taking an essay a student had written, running it through a tokenizer and swapping the verbs and nouns with their synonyms. Would the essay still mean essentially the same thing? Would it still even be recognizable? It sounded reasonably easy to do in an afternoon, so I figured I’d give it a shot.

I already had the tokenizing part, so I looked around and decided on Big Huge Thesaurus for their thesaurus API.

About an hour of fooling around later, and Syndicate was born. It’s dead simple to use and provides hours minutes of amusement.

To play with Syndicate, you’ll need to get an API key from Big Huge Thesaurus (they’re free).

$ git clone git://github.com/dakrone/syndicate.git
$ cd syndicate
$ (vim|emacs|mate) src/syndicate/core.clj
change "YOUR-API-KEY" on line 8 to be your API key from Big Huge Thesaurus
$ lein deps
$ lein repl
user=> (use 'syndicate.core)

From here, there’s really only a few methods:

(replace-text "Google has lately found itself on the receiving end of criticism from privacy and transparency advocates. But with two new tools, Google is trying to convince them that the company is on their side. On Tuesday, Google will introduce a new tool called the Transparency Report, at google.com/transparencyreport/. It publishes where and when Internet traffic to Google sites is blocked, and the blockages are annotated with details when possible. For instance, the tool shows that YouTube has been blocked in Iran since the disputed presidential election in June 2009. The Transparency Report will also be the home for Google's government requests tool, a map that shows every time a government has asked Google to take down or hand over information, and what percentage of the time Google has complied. Google introduced it in April and updates it every six months. Government requests could be court orders to remove hateful content or a subpoena to pass along information about a Google user. The idea is to provide transparency, and we're hoping that transparency is a deterrent to censorship, said Niki Fenwick, a Google spokeswoman.")

Syndicate will return the text with the nouns and verbs replaced by other random synonyms. The result can be somewhat humorous in the choice of random synonyms:

Google has lately set up itself on the receiving point in time of disapproval from seclusion and image somebody . But with two new creature , Google is trying to win over them that the social affair is on their opinion . On Tues , Google will set a new way called the transparentness news , at google .com/transparencyreport/ . It publishes where and when cyberspace give-and-take to Google parcel is blocked , and the block are annotated with info when possible . For natural event , the agency render that YouTube has been blocked in Iran since the disputed presidential selection in Gregorian calendar month 2009 . The ikon account will also be the location for Google 's politics subject matter creature , a representation that demonstrate every reading a politics has communicate Google to admit down or transfer over cognition , and what part of the moment search engine has complied . search engine introduced it in Gregorian calendar month and word it every six period of time . polity petition could be tribunal taxonomic category to vanish hateful contentedness or a judicial writ to go along along content about a Google mortal . The melodic theme is to ready transparence , and we 're cross that foil is a balk to security review , said Niki Fenwick , a search engine interpreter .

You can also play around with just replacing a single word:

user=> (replace-noun "car")
"automobile"
user=> (replace-verb "shout")
"outcry"

That’s all there is to it, just an amusing library for playing with text in < 100 lines. In the future I’ll add a method to specify an API key without changing the source.

tags: ,
posted in clojure, nlp by Lee

3 Comments to "Swapping words with their synonyms for fun"

  1. Kephas wrote:

    Very nice, I enjoy fun side projects!

  2. Thesaurus wrote:

    You can also use swapping not for fun only. It may help you in many things from rhyme finding to improving your language!

  3. TomH wrote:

    Someone at HP Research did something similar about 30 years ago for a research squib: they took a single English sentence of about 12 words and substituted synonyms wherever they could to create sentence “variants”. The variants were all grammatically valid and, if my memory is correct, they ended up with roughly 9k variant sentences!
    A good example of some of the problems of NLP.

 
Powered by Wordpress and MySQL. Theme by Shlomi Noach, openark.org