Paraphrasing tools
This project originated in the recent tussle between Rogers Cadenhead (Drudge Retort) and The AP, in which The AP asserted copyright over excerpts from its news items as short as five words (and threatened Cadenhead with DMCA action for excerpts of 39 words.) I reasoned at that time that if excerpts were paraphrased, then logically copyright should not apply. Now note -- The AP also asserts a right to apply copyright to "rewritten" works. This isn't a right granted by actual law, but that probably won't stop The AP from suing anyway. They have lawyers just sitting there eating, no matter whether they sue you or not, so they might as well just sue you.

But from a standpoint of actual copyright, a rewritten text is no longer the original work. So the ideal tool would take an AP (or other news wire) item, and paraphrase it. But paraphrasing is truly a human activity -- it's hard to preserve even syntax, let alone the semantics of a text being paraphrased. So any tool we want to provide for paraphrasing will have to be interactive. Eventually, maybe we'll get to the point where we can largely let the tool have its head -- but that's going to be a long time coming.

So this is my current notion: given a starting text (which we'll store for later reference), we'll break it into sentences and put the text up in a side-by-side comparison screen. Each extent of words which is five words or longer will be flagged with a background color, to facilitate identification. Quotes in the text will not be paraphrased or flagged; they'll largely be treated as blocks.

Names of people and places will also be identified as blocks. This is to prevent mixups like the word "Gore" being treated as the word "gore", etc. If we pass these texts through anything automatic, we don't want to lose the nameness of those names. However, they're special blocks, because there are certain transformations we can perform on names (as simple as replacing "Al Gore" or "Gore" with "Mr. Gore", or as involved as using "the former Vice President" or "Nobel prize-winner Albert Gore".)

We'll have a collection of "syntactic frames" which will grow over time. These frames will be used to pluck the low-hanging paraphrasal fruit, e.g. we should be able to rewrite "Gore said" as "said Mr. Gore" with no effort at all. Synonyms will be treated identically, so we might say "X said" can be rewritten as "X indicated." (Although that might require a little more tact. "X said Y" would have to be "X indicated that Y", while "Y, said X" is safe to rewrite as "Y, X indicated.")

(Aside: if anyone knows of existing databases of nice, actionable frames like the above, tell me.)

We'll also have "joke paraphrasing tools" -- Pig Latin, or passage through Babelfish to Japanese and back. The quality of these paraphrases are not intended to be useful or serious; they're intended to make a point and satirize the situation.

Now comes the interactive part. For each sentence in the paraphrase as it stands, we'll have a button which, when pressed, pops up that sentence, along with its original text, in a small form allowing the sentence to be paraphrased individually. This means we can call the various automatic tools on it, or we can simply type in a new text. When submitted, this form does two things: first, it stores the pair as a paraphrase for later analysis. (We'll encourage users to make small, incremental changes when rewriting, to facilitate identification of reusable frames.) Second, it will update the comparison text and show the new flags for textual identity. That is, if there was an extent of 15 words of identical text, and now there are only 5, the new display will be less red or something.

This allows three things. First, it simply provides a fun and easy tool to munge text in a humorous way. Second, it provides a good way to measure your paraphrasing and check them for Googlable phrases The AP might use to find "actionable" quotes. But third, it will allow us over time to come up with automatic paraphrasal tools to make it easier and easier to paraphrase text for safe quoting. Over time, we may be able to paraphrase quickly and easily (I don't actually know how well this will work until it's tried for a while.)

Development status: planning only






Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.