« Using DocType properly | Home | How to get XML/SGML tools under Windows »

April 15, 2002

TextArc: making text beautiful

The New York Times has an article about a site called TextArc.org  From the site:

A TextArc is a visual represention of a text—the entire text (twice!) on a single page. Some funny combination of an index, concordance, and summary, it uses the viewer's eye to help uncover meaning. A more detailed overview is available.

From the Times article:

The texts, which range from Lewis Carroll's "Alice's Adventures in Wonderland" to Balzac's "Z. Marcas," are too tiny to read around the perimeter. Behind the computer glass, though, Mr. Paley's online software is counting each word and noting its location every time it is used. The oval's black center soon fills with legibly larger versions of every word from the source text. Different stories look different. As a result, Mr. Paley's software effectively turns any prose into concrete poetry in which a word's size and location are as important to its meaning as how it is used.

Once TextArc slices and dices a story, the most frequently used words are the brightest. So in the Carroll work, "Alice" glows at the center. And each word's location in this linguistic constellation is determined by its exact locations in the story text. "Cheshire," for instance, is near the bottom, close to the two middle chapters in which the cat materializes. Roll the cursor over a word, and lines pop up that connect it to all the points in the outer circle where the word is used.

I thought the site would be overloaded - and perhaps it will be - but it was up when I tried it.  The effect is fascinating: I tried Edward Abbott's Flatland, the early 20th century fable about people who live in a land of two dimensions.  Words like flatland, circle, and women are more towards the center; the word sphere is more towards one side, indicating that it's used more in one section of the book.

The web site gives no details of how this is implemented, but the actual application runs in Java on your own machine.

When you launch textarc on a text, it starts drawing it as a concentric spiral around the screen, and I'm guessing that during that time the application is actually downloading the full text of whatever you're looking at.  (The texts are taken from Project Gutenberg) .Looking at my memory usage on my Internet Explorer app, it grew to 76meg when I downloaded Jane Austin's Pride and Prejudice. 

TextArc claims that their application runs quicker in Netscape 6.2, and that may be true, but it crashed in NS 6.2, and didn't crash Internet Explorer.  I'll try Mozilla 0.99 next.