April 18, 2024

The Mechanic Muse: What Is Distant Reading?

Franco Moretti has a solution: don’t read them. Moretti is not a satirist. He’s an Italian literary scholar and the founder of the Stanford Literary Lab, which opened last year, published its maiden pamphlet in January and followed up with another last month. The first pamphlet asks whether computers can recognize literary genres, and the second uses network theory to re-envision plots.

As its name suggests, the Lit Lab tackles literary problems by scientific means: hypothesis-testing, computational modeling, quantitative analysis. Similar efforts are currently proliferating under the broad rubric of “digital humanities,” but Moretti’s approach is among the more radical. He advocates what he terms “distant reading”: understanding literature not by studying particular texts, but by aggregating and analyzing massive amounts of data.

We need distant reading, Moretti argues, because its opposite, close reading, can’t uncover the true scope and nature of literature. Let’s say you pick up a copy of “Jude the Obscure,” become obsessed with Victorian fiction and somehow manage to make your way through all 200-odd books generally considered part of that canon. Moretti would say: So what? As many as 60,000 other novels were published in 19th-century England — to mention nothing of other times and places. You might know your George Eliot from your George Meredith, but you won’t have learned anything meaningful about literature, because your sample size is absurdly small. Since no feasible amount of reading can fix that, what’s called for is a change not in scale but in strategy. To understand literature, Moretti argues, we must stop reading books.

The Lit Lab seeks to put this controversial theory into practice (or, more aptly, this practice into practice, since distant reading is less a theory than a method). In its January pamphlet, for instance, the team fed 30 novels identified by genre into two computer programs, which were then asked to recognize the genre of six additional works. Both programs succeeded — one using grammatical and semantic signals, the other using word frequency. At first glance, that’s only medium-interesting, since people can do this, too; computers pass the genre test, but fail the “So what?” test. It turns out, though, that people and computers identify genres via very different features. People recognize, say, Gothic literature based on castles, revenants, brooding atmospheres, and the greater frequency of words like “tremble” and “ruin.” Computers recognize Gothic literature based on the greater frequency of words like . . . “the.” Now, that’s interesting. It suggests that genres “possess distinctive features at every possible scale of analysis.” More important for the Lit Lab, it suggests that there are formal aspects of literature that people, unaided, cannot detect.

The lab’s newest paper seeks to detect these hidden aspects in plots (primarily in Hamlet) by transforming them into networks. To do so, Moretti, the sole author, turns characters into nodes (“vertices” in network theory) and their verbal exchanges into connections (“edges”). A lot goes by the wayside in this transformation, including the content of those exchanges and all of Hamlet’s soliloquies (i.e., all interior experience); the plot, so to speak, thins. But Moretti claims his networks “make visible specific ‘regions’ within the plot” and enable experimentation. (What happens to Hamlet if you remove Horatio?)

Kathryn Schulz is the author of “Being Wrong: Adventures in the Margin of Error.”

Article source: http://www.nytimes.com/2011/06/26/books/review/the-mechanic-muse-what-is-distant-reading.html?partner=rss&emc=rss