Archive for the 'A to Z' Category

« Previous Entries



AL-gorithm is a completely analog text munging algorithm in three parts.

AL-gorithm CloseupThe first is an interactive version of a passage from All the King’s Men created by cutting out all but a single character from laser prints of the quote. I intended to letterpress the quotes to explore the text’s physicality in full (though conceptual completeness probably dictates that I should have cut my own font) but decided to temper my art with a little reason. I did however preserve my intention with my choice of font; 24-point Bodoni was the typeface I would have used had I laid out the type and printed it on a press. The physical algorithmic process is documented here.

AL-gorithm installed

The second is a bag filled with the discarded bits of text. In digital space, memory registers that held the initial text can be overwritten. Clearing memory in the analog world is a little more complicated.

AL-gorithm closeup

O TemplateThe third is a visualization that’s intended to emphasize both the physical origin of text and also its arbitrariness. The cutout for each letter produces a unique pattern. I’ve used this pattern to generate visualizations that highlight the letter’s frequency and distribution while also serving as a symbol for the letter itself. Taken together, the pattern visualizations form a new abstracted alphabet, in which I’ve re-written the quote. Certain letters—q, x, j, z—don’t appear in the quote and are thus not part of the new alphabet. My intention here is to make the levels of abstraction that underlie programming languages easily comprehensible. In this example, if the alphabet is machine language, then these patterns are written in a language that’s one level up. Remember, we are the machines in this analogy, so we can’t understand the higher levels of abstraction.

Here’s the text, recursively generated by reinserting the patterns of letter frequencies into the original text (recursively makes it sound like it was done programmatically—I actually manually created the pattern files in Illustrator and then used them to create a font).
Recursive Penn Warren


There is definitely something liberating about reducing Shakespeare and Austen and Whitman and Frost to algorithmic plasticine. It’s healthy occasionally to check our reverence of texts lest the objects usurp the meaning they contain, but at the same time, it’s important to recognize their status as objects, as discrete entities with physical texture and context. Digital text is infinitely malleable, yes, but it is also ephemeral. Close a book, and the words remain; turn off the screen, and they’re gone.

I’ve spent an entire semester slicing up texts using a variety of digital methodologies and philosophies, moving from grep’s graceful julienne on the command line to much more vigorous and grammatically aware Java frappés. My goal throughout: forcing text to perform all manner of cruel contortions for my amusement and edification, compressing a novel into a few lines or stretching out sentences into languid visualizations strung with looping semantic threads. I can’t help feeling that in the process I lost something, and it’s taken me all semester to figure out that it’s a sense of text’s increasingly atavistic physical nature.

This project was born of the late-night liaison of two ideas.

  1. What happens to all the text that an algorithm discards? Writing text-munging algorithms is relatively painless, so much so that it’s easy to forget the text entirely. I wanted to rediscover text munging from the algorithm’s perspective. I documented the entire process of becoming an algorithm.
  2. Moving text off of the screen/page and into three-dimensional space. I have been thinking recently about designing interactive narratives that someone can experience architecturally, literally walking through a story. My initial sketches were based on “physical” interactions with stories in a digital game environment, but that begged the question, “What would an architectural text look like?”

BodyWorlds Cross-sectionBrian Dettmer’s Book Autopsies provide one pretty good answer. Their pages are static, however, transformed by his nimble cutting into something other than book pages, something beautiful but unreadable, if we believe his nomenclature, something dead. That reminded me of some of the cross-sections of human cadavers exhibited in Vesalian glory at the Body Worlds expositions—you can still recognize the body even though it’s been unnaturally dismantled. I wanted to keep the text alive but still allow you to move through it. Here are a bunch of other three-dimensional text sculptures.

25 Bond Street FacadeOn mornings when the sun is out, I like to walk to school along cobble-stoned Bond Street and look into the gallery windows, scoff at Herzog & de Meuron’s ridiculously splashy facade, and marvel at the understated multi-faceted building at 25, which provided the final piece of inspiration. I had already started cutting out letters when I realized that the stacked patterns looked a lot like the building’s facade. Architectural text indeed!

I Have Become Java, Destroyer of Words

Everything happens for a reason.

It’s not my place to ask why.

I take the sheets as they come to me, one at a time, and make my marks in pencil, two along each edge. I join the marks to create a frame. I measure 5mm from the bottom of each line and then 2mm above that and mark them off as well. I place a ruler along the first line and cut along its edge with an Xacto knife, mindful of the Target.

Sometimes, the Target is easy to spot. If it’s a letter with an ascender or descender, p for instance, it catches my eye like a little hook. I lift the knife before I reach it, skip over it, and then continue cutting. But some Targets like to hide. They play tricks on the eye. H is a gregarious Target; it mingles easily with commoners such as t and e and hangs on the words of sophisticates like g and p. Such a Target is smooth to the eye, like a polished pebble. Even when I read the line to myself I sometimes miss it. Such a Target is trouble.

It’s ok to miss a Target once. Each Target is held by strips running along its top and bottom. Once I’ve cut both, then I turn the page on its side and make short cuts along the edges and around the target, lifting out the freed bits carefully with the point of the Xacto and storing them in a bag. One never knows when one might need a letter. If I miss a Target twice, top and bottom, and it is cut free, that is a Bug and there is nothing I can do but start over.

At first, the process was novel and filled with discovery. I learned to score the paper before cutting it and, inspired by Adam Smith and Frederick Taylor, I sought to divide my tasks by type—scanning, measuring, marking, cutting, vertical, horizontal—and thereby increase my efficiency. This, however, proved so boring that I lost interest and quickly found my work infested with Bugs. I returned to the original loop: measure, mark, cut, turn, cut, free, and repeat.

I worked for 23 hours, so you might think that I could recite the words I was cutting around by heart. The truth is that not once did I read what I was working on. I actually couldn’t tell you what the text I was reading was about or even if it was a text at all. All I can tell you is that when I found a Target, I cut around it.

Intention lives outside of the ALgorithm.

Hearing in Tongues

After creating a Spanish agglomerative insult generator last week and having recently read Neal Stephenson’s Snow Crash, I had language and Babel on the brain.

The other day I was looking at some Chinese money I found in a pocket of an old pair of pants, marveling at the weirdness of the Manchu script that appears beneath all the other supposed indigenous Chinese languages in several places. It’s relatively easy to identify a language with a peculiar script, but how do you distinguish between superficially similar languages, say Italian and Spanish or Dutch and German?

Well, for starters you could use my Bayesian filter Java application. It accepts any number of example language files (in the case below, selected texts from Project Gutenberg) and the text to be identified either on the command line or in a file. In a matter of seconds determines to the best of its ability the linguistic provenance of said text. Though it overconfidently and erroneously identifies languages it hasn’t been “trained” in, it has yet to misidentify a language it has been given as an argument:

Terminal Window Showing Java App

To create the script, I modified Adam Parrish’s example code so that it identifies one language rather than listing each and assigning it a score and then displays the word that proved the most relevant in the program’s assessment. If you feed the script a language it doesn’t know and it confidently classifies the text as, say Danish, usually the word it found was most relevant is unusual in Danish and much more common in a language that a quick google will reveal.

The code is here and here.

Many splendored receptacles of poop

Anglo-Saxons must have been a fiercely efficient race. I suspect that the Protestant work ethic and eating in cars during a commute are vestiges of that efficiency, the same efficiency that has left its mark on the English language—the language of getting things done.

Mediterranean cultures, on the other hand, like a three-hour lunch. Much to the horror of Strunk and White, they’ll not only use two words when one will do, they’ll probably use ten. It’s hardly surprising that Romance languages lend themselves to meandering and entirely uneconomic profanity. A hasty “fuck you!” would never do, oh no.

Much was made of the agglomerative nature of French profanity by the Merovingian in the second Matrix movie:

I have sampled every language, French is my favorite. Fantastic language, especially to curse with. Nom de dieu de putain de bordel de merde de saloperie de connard d’enculé de ta mère. Like wiping your ass with silk.

Well. That is a long string of nested prepositional phrases that doesn’t make all that much grammatical sense. That may be enough for Gallic cyber-demons, but in Spain, we take pride in the grammatical correctness of our impossibly long invective.

Case in point: the Me cago en… (literally, I shit on…) construction. Common toilet substitutes include the sea, everything that flies, your mother/father, and all manner of blasphemous locales too sacrilegious to transcribe in English.

But the mark of a wordsmith, of a cultured man or woman of letters, is the ability to generate, on the fly, original, long, and grammatically correct receptacles in which to void. Though non-native speakers can never hope to attain a true mastery of this particular cultural form, using my handy Java-powered insult generator can at least open their eyes to its endless possibilities.

Here are some particularly juicy examples, in the original and translated:

Me cago en un pasaporte. (I shit on a passport.)

Me cago en todo lo que mira a tu supervisor que aparece a tu padre que come el oceano a mi hermano que navega de tu Dios a esas uvas. (I shit on everything that watches your supervisor who appears before your father who eats the ocean of my brother who sails from your God to those grapes.)

Me cago en las jorobadas hormigas de mi abuela. (I shit on my grandmother’s hunchbacked ants.)

Me cago en los olvidados franceses. (I shit on the forgotten French.)

Never better said.

The insult strings are generated recursively using the code described here and the following grammar:

# clauses
S -> Me cago en NP
NP -> todo lo que VP
PP -> P NP
PP -> P Pos Per
PP -> P Pos Per QP
QP -> que VP PP
VP -> V
VP -> V NP
VP -> V PP
NAF -> AdjF NF
NAM -> AdjM NM
NNP -> Pos Per

# terminals
DetF -> la | una | esta | esa
DetM -> el | un | este | ese
DetFP -> las | estas | esas
DetMP -> los | estos | esos
NM -> mar | sol | coche | oceano | pais | gobierno | presidente | sueño | pastel de cumpleaños | perro | gato | sombrero | plato | futuro | chorizo | pasaporte | coño
NF -> mar | sombra | cama | hostia | leche | lluvia | verdad | tortilla | mierda | cumbre | polla | leche
NMP -> americanos | franceses | pantalones | zapatitos | cojones | primos | antepasados | albondigas
NFP -> embarcaciones | hipotecas | gafas | uvas | sillas | hormigas | ruedas
P -> a | de
Pos -> tu | mi
V -> come | ve | mira | vuela | salta | conduce | escribe | navega | fomenta | aparece | apesta | huele
AdjM -> puto | asqueroso | maloliente | podrido
AdjF -> pegajosa | sucia | mísera | pobre | desesperada
AdjMP -> olividados | sucios | malditos | putos | poderosos
AdjFP -> putas | malditas | jorobadas | odiosas
Per -> padre | abuela | prima-hermana | bisabuelo | madre | Dios | supervisor | jefe | hermano

Grammatical gender presents certain problems, as do different prepositions. A truly robust generator would separate nouns into different types (locations, people, activities, organizations, etc.) and pair them with appropriate verbs and prepositional phrases. But I can do this without a computer. It’s more interesting to see how a computer squeeze together phrases that, while grammatically correct, I would never say in tandem.

I shit on your birthday cake.

I know why the compiler sings: A Homework Generator

In GEB, Douglas Hofstadter argued that recursion and self-referentiality are the precursors of consciousness. I’m nodding my head in vigorous assent, but I’m still not sure I really understand what that means. If I’ve learned anything at ITP, it’s that the best way to understand something is to build it yourself, so for my A to Z midterm I constructed a homework generator—a Java program that outputs working Java programs for munging text along with a short description of what they do—and in the process came a little closer to understanding what Hofstadter was getting at.

Writing code that randomly generates working semantic code using n-gram analysis or generative grammars seemed like too formidable a task so I borrowed my approach from Raymond Queneau’s “One Hundred Thousand Billion Poems,” fourteen sets of ten lines each from which the reader is supposed to select and assemble a sonnet. The reader must select one and only one line from each set and do so in the order they’re presented. It is a constrained system that mimics the constraints of the sonnet form itself, and despite these constraints, still produces an astounding number of outputs.

My Homework Generator works in a similar way. When the code is run, it chooses one “line” from each of four sets and assembles it into semantically correct Java that can be compiled and run to munge any text that’s fed into it. Unlike the reader of Queneau’s sonnet, the Homework Generator doesn’t have to proceed in any particular order through the sets, nor does necessarily have to pick a line; it can simply skip a set if it chooses. Both the decision to select from a set and the order in which it is selected affect the final result. My combinatory math is a little rusty, but if there are three groups of three and one group of one and at least one must always be chosen and order matters, I believe there are, I believe the technical term is, a shitload of combinations. This is how it works:


Because instead of text output, the program generates code that when run will generate text ouput, debugging was tricky. After I compiled the Homework Generator, it produced code that in turn needed compiling. Because of the large number of back slashes and quotes in the code, I found myself thinking like a compiler, going through the code and adding escape characters to make sure that the twice compiled code would still produce the results I wanted. It took some doing, but it works!

The final code is here. Below are examples of the descriptions the code generates followed by actual generated code followed by the results of using it to munge Robert Frost’s “Stopping By Woods On A Snowy Evening.”

Note: Adam Parrish is my instructor for the course.

The text filter I created for this week’s class will whisper the text you give it by decapitalizing one line before searching the internet for all my correspondence with Adam Parrish. The resulting text is a post-modern limerick about words better left unsaid.

The text filter I created for this week’s class will destroy the text you give it by erasing every four-letter word before removing every fourth character from the contents of Adam Parrish’s desk drawer in the Residents’ office. The resulting text is an Oulipo poem about algorithms.

The text filter I created for this week’s class will repurpose the text you give it by truncating each line that contains the word ‘the’ before searching the internet for Adam Parrish’s phone number. The resulting text is a reasonable substitute for grammar.


import com.decontextualize.a2z.TextFilter;
import java.util.ArrayList;

public class myHomework extends TextFilter {
    private ArrayList mF= new ArrayList();
    public static void main (String args[]) {
        new myHomework().run();

    public void eachLine(String line) {
        String[] word = line.split("W+");
        for (String w: word) {
            if (w.length() > ((int)(Math.random()*8))) {
                w = w.toUpperCase();
            line+=w+" ";

    public void end() {
        for (int i = 0; i<mF.size(); i++) {
import com.decontextualize.a2z.TextFilter;
import java.util.ArrayList;

public class myHomework extends TextFilter {
  private ArrayList<String> mF= new ArrayList<String>();
  public static void main (String args[]) {
    new myHomework().run();

  public void eachLine(String line) {
    line = line.replaceAll("[,;:]", "!");
    line = line.replaceAll(".", "?");
    line+=" and like";

    line = line.toLowerCase();

    String[] words = line.split("W+");
    for (String w: words) {
      if (w.length() != 4) line+=w+" ";

  public void end() {
    while (mF.size()>0) {
    int randomIndex = (int)(Math.random() * mF.size());

Try to figure out what code produced them.



4nd 70 90 1
70 45k 1F 15
My mu57 7H1NK 17
0f w1nd 4ND D0WNY
W00D5 1 7h1nk 1 kn0w
H15 4
70 W47CH H15 W00D5 F1LL UP W17H 5n0w

6u7 1 70
W00D5 d4rk 4ND
70 570p W17H0U7 4
4nd 70 90 1
0nly 50UND 5
w1ll n07 570PP1N9
H15 15 1n 7h0u9h
570PP1N9 6y W00D5 0n 4 5n0wy
w00d5 4ND


To ask if there is some mistake? and like
The only other sound’s the sweep and like
Between the woods and frozen lake and like
And miles to go before I sleep! and like
My little horse must think it queer and like
and like
Whose woods these are I think I know? and like
Stopping By Woods On A Snowy Evening and like
But I have promises to keep! and like
His house is in the village though! and like
To stop without a farmhouse near and like
He will not see me stopping here and like
The woods are lovely! dark and deep? and like
The darkest evening of the year? and like
To watch his woods fill up with snow? and like
Of easy wind and downy flake? and like
He gives his harness bells a shake and like
And miles to go before I sleep? and like

Markov Rhymes

Continuing with the hip-hop theme, I thought it would be interesting to subject my painstakingly assembled compendium of Notorious BIG lyrics to our Markov filter from last week. Given Biggie’s particular way with words, I was curious to see whether the filter would generate new lines in his voice. The results, which lack his clever turn of phrase but retain his bravado, are below.

I tried to get the filter to accept arguments on the command line, but that was a no go, so I manually changed the order of the n-grams used to generate the following snippets of rhymes never sung. Orders below 4 produced nonsense while orders above 7 produced ArrayOutOfBounds exceptions (I suspect there are lines in my source text that max out at 7 characters—”uhh uhh”).

UPDATE: So the problems mentioned above are fixed. I was calling the constructor of a function that took arguments before passing in those arguments. I eliminated the arguments from the constructor and wrote a setter function and now I can pass in the order of the n-grams I want and their maximum length. The code is here and here.

I love the drug connotations of the n-grams after each title

Leave cars with me? (5-grams)

You ass assumptions, lead led to dumpin
Sticks and Biggie Biggie Smalls the tripping of a superstar
Tell the crew run the right shit, out TV’s
no mo’ richest rest fo’ sho’ (YEAHH)
I know yo’ asshole!
Your crew, flipping,
And I just love process of this on the right wit’cha
(can I get a hundred shots
Got a nut
Shouldn’t have the show up my skirt but I state “You know we do.
So anyway I don’t take em all of this one,
pass that weed i got a bag bitch I left the liquor

And I just speakin (7-grams)

Had to re-up; see no more
Niggaz got to feel one, caps I got more Mack than Cortex singers:
“They pray..” 4X through, but I’m up in the phone call,
it couldn’t hit me on the rawess niggaz spit be counterfeit, robbery,
I’m the right one, pass that weed i got to “See-three-P.O.’s”
With my rocks
Fifty dollar and a half
Pendejo’s, I show you got to die, if I go to sleep safe, not to hear me
I wanna hear right boo [truuuueee]

I wanna get witch get wit’chou” (4-grams)

I fuck your way I don’t pass with my friend his wack
Thugs and pop-pa”
My moms and you see battle steam-and-Heavy rock with the kid’s why y’all
me scared the ransom no light,
I’m big speak all your burn me.
Always why they wit’cha
(can bullshit the fuckin hoes do mean stuck around flow up this than
Biggie gonna brick do’, in you in home, they broke, and stack of me,
talk your game, talk you see me one, pass witch)
Passthat rah rah rah shots that weekend

A Rap Concordance


Drawing on the brilliant hip hop powerpoint that made the rounds some time ago, I built a Processing sketch that builds a concordance of hip hop lyrics (in this case, Biggie Smalls’s, though any text can be fed in) and creates a series of appropriate data visualizations.

The code is here and here. It’s inelegantly hard coded in order to display nicely, but if my goal were to generalize the program to work for any hip hop lyrics, I’d build a nice interface that allowed you to select the words to be compared.

It would be cool to assemble concordances of various different rappers’ lyrics and compare the preferred pastimes of, say, Snoop and Biggie. I can imagine the categories being something along the lines of smokin’, blastin’, chillin’, raising hands in the aiyah, etc.


For A to Z this week, I wanted to use regular expressions to replace selected words in a text. I started out with the idea of cliché algebra: using clichéd metaphors a la—time is money, business is war, bigger is better, less is more, knowledge is power, seeing is believing—to replace all occurrences of each word with its metaphorical equal. I tried on a couple of texts, but none of the words occur commonly enough in close combination to produce a noticeable effect, even on a list of headlines, and I wanted my filter to work on any text, not just on a specifically designed one.

I also played with the idea of censorship (removing all four letter words and replacing them indiscriminately with “sugar”) before settling on the Shakespearator—a relatively arbitrary set of rules to make a text seem Elizabethan. It works well on all texts but especially on texts with a profusion of second person pronouns. I ran the script for Goodfellas through it, which was less funny than it should have been, so I tried it on a bunch of other decidedly un-Elizabethan texts. Below are two of my favorites.

From the Adobe website:

Installation instructions for MacOSX and MacPPC

Installation of Adobe Flash Player mayst require administrativest access to thine PC, which is normally provid’d by thine IT department. For the installation to succe’d, thou wilt be ask’d to closeth all open browser windoweth during the installation.

Clicketh the download link to begin installation. If a dialog box appearest, followeth the instructions to save the installer to thine desktop. Save the Installer to thine desktop, and wait for it to download completely. An Installer icon wilt appearest upon thine desktop. Double-clicketh upon it.

Read and clicketh through the dialog boxes. Thou wilt be prompt’d to closeth all open browser windoweth to continue with the installation. When the Install button appearest, clicketh it to install Adobe Flash Player into thine browser’s plug-ins folder. Thou canst verify the version thou hath install’d by visiting the About Flash Player page.

And, even better, from How to Touch a Woman to Drive Her Wild:

Sensual touching is an art that thou shouldst definitely spend some time mastering — because it wilt be incredibly rewarding to both thou and the woman in thine life.

Touch her more. However much thou art already touching thine girlfriend, wife, or lover…thou canst dost it more often. I canst’t emphasize enough howeth much of an emotional connection and bond canst be form’d by this simple action. Women link many feelings of sexuality, love, and trust with the sensations that art arous’d in them when a man putteth his hands upon her…

Howeth’s that for the simplest tip ever?

Try it out. I promise that ’tis as effectivest as ’tis simple.

… When thou art putting thine hand upon her, whether thou art caressing or squeezing…or petting or holding or any other kind of touching…Look into her eyes as thou art doing it. Thou mayst think, hecketh, I already look at her when I touch her…But just try this — try being awarest of intentionally holding her eye contact as thou touch her.
I think thou wilt find that it maketh a very big difference.

Try touching her in new ways… Just placeth thine hand upon her shoulder, the backeth of her necketh, her thigh, arm, or hand… Let her feel thine masculine strength, but don’t, obviously, hurt her. If thou dost this righteth, she shouldst feel the tenderness and protectiveness behind thine touch.

« Previous Entries