Gollum is a pretty vulnerable guy.
###TLDR: IBM Watson crunches the numbers on LOTR characters. Tells us who is self-conscious, who is neurotic, and whether book Aragorn or movie Aragorn is more alpha. Bonus: Watson breaks down the personality of the average LOTR tweeter.
Watson’s User Modeling technology (try the demo!) is a slick piece of software that can estimate personality traits from blocks of text produced by an individual. I (ab)use it to analyze and compare various LOTR characters’ personalities in Tolkien’s novels and Jackson’s films.
(Thanks to Roshni Cooper for valuable feedback and suggestions.)
The first graphic compares characters to one another on a variety of traits. This analysis is based on the lines they speak in the LOTR trilogy and the first two Hobbit films (not counting the Russian version). Some of the less interesting traits and characters are filtered out, along with those characters who have insufficient data for a particular trait. Additional technical details below.
You see in this blowup — unsurprisingly — that Bilbo and Gollum are diametric opposites so far as self-cosciousness goes. More interestingly, Thorin isn’t too far off from Gollum. He may have better personal hygiene, but Watson confirms for us that he doesn’t exactly suffer from social anxiety.
####Some additional tidbits from the graphic:
- Gollum is stupid, friendly, and — this one is interesting — surprisingly vulnerable.
- Gimli and Aragorn sit at opposite ends of the happy/sad divide.
- Despite his relatively sparse dialog, Legolas displays some extraordinarily strong elf-like characteristics: guarded, fearless, distrustful, calm, and stable.
- As compared to Frodo, Sam tends to be slightly friendlier and slightly less depressed. He also has a stronger tendency towards cooperation.
###Jackson vs. Tolkien
This one compares Tolkien’s characters to their film versions. Traits in red have significantly increased emphasis in the films, and traits in blue have significantly increased emphasis in the books. Again, characters and traits with insufficient supporting data have been filtered out, as have some less interesting characters. Additional technical details below.
Readers have noted in the past that book Gandalf is all confidence and power as compared to his film counterpart. Watson confirms that his character was indeed “nerfed” by Jackson: going from book to film, extraversion and assertiveness go down while cautiousness and neuroticism go up.
####A few more observations:
- Book Denethor’s more regal qualities are diminished in the films, while his moodiness and anxiety is emphasized.
- Legolas has an artistic, eloquent streak to him in the books. This is muted by Jackson, who prefers a competent and forthright Legolas.
- Aragorn is indeed less alpha in the films than in the books.
##Background Fluff (and another figure)
The Battle of the Five Armies is now in wide release, and Peter Jackson’s interpretation of Middle-Earth appear to be coming to end, for better or for worse. (Actually, maybe not .) While many of the discussions about LOTR these days tend to focus on the plot elements, on the lore, or on Jackson’s love of CGI, I am more in seeing what he did with the characters.
Over the course of Jackson’s six films and Tolkien’s many writings, LOTR fans have interpreted the characters’ personalities in various ways. Additionally, some very strong opinions have been formed about the changes introduced by Jackson, Walsh, Boyens, del Toro, and Sinclair in the various live action LOTR films. While many of these analyses focus on plot and lore deviations by the filmmakers, there has also been some attention on changes to the underlying characters. A common observation, for instance, is that film-Gandalf is significantly more likeable whereas book-Gandalf is significantly more all-knowing. In other cases the comparison is less clear. For instance, does book Aragorn have more self-doubt, or less? Debate often rages in the forums. Just how alpha is Aragorn?
So what does the data have to say about this? And there is most definitely data: we have easy access to every line of dialog spoken by every character in every book and every film.
(For those who haven’t yet seen it, know that this isn’t the first time someone has taken a data-driven appraoch to LOTR. Emil Johannson’s LOTRproject is jaw-dropping for both its scope and its aesthetics.)
####The technology Enter IBM’s Watson user modeling services. After all, who better to ask about understanding people than Watson? Trained on large corpuses of text data and correlated with personality information, user modeling can estimate an individual’s personality traits based on a collection of text he/she wrote. You can actually play around with it yourself. Be prepared for a surprise! It can often identify personality characteristics that even humans have trouble noticing.
To illustrate, I harvested 2 years of LOTR-related twitter data (thanks to Twitter’s new complete public index!), and estimated the average personality spectrum of a person who tweets about LOTR. Here’s what Watson has to say (all numbers are percentiles against the general twitter population).
Underlined in red are the three most emphasized personality facets (imagination, authority-challenging, and intellect) and the most prominent need (self-expression). Imagination, intellect, and self-expression each have obvious links to the LOTR fandom, and require no explanation. It’s cool that the personality analytics machinery is able to pick this out. The need to challenge authority may come as a surprise, but it is worth noting that there is a historical link between LOTR and rebelliousness.
####Extracting character data It is not particularly difficult to extract character dialog from any of the films. Film scripts and transcripts are readily available online for both the original trilogy and the first and second Hobbit films. As a byproduct of rigid screenplay structuring, you can pull dialog and identify speakers with pretty straightforward regular expressions.
Tolkien’s books are just as readily available. However, novels aren’t as clearly structured as screenplays, and this makes for a more complex dialog extraction task. Thankfully, Tolkien follows the discipline of having at most one character speaking in any given paragraph. He follows several other rules as well: in what appears to be more than 95% of cases, the speaker is explicitly identified with a phrase such as “Bombadil said” or “Bombadil whispered.”
It is very much possible to estimate the remaining 5% of cases by identifying the most recently-mentioned individual prior to a particular dialog. However, this process can introduce misattributions, and so I simply discard the 5%. Throwing out a few lines for a chracter does technically raise the noise floor, but it is not nearly as potentially catastrophic as assigning lines from one character to another. I’m essentially erring on the side of perfect precision over perfect recall.
Given a collection of text produced by an individual, Watson User Modeling produces personality scores for each of a few dozen personality “facets.” These scores, however, are only meaningful in relation to the personality scores from other individuals. For instance, the percentage-valued scores produced by the demo are percentiles against a background population of individuals.
As such, instead of reporting the raw scores, I report the ranking of characters for each personality trait. This effectively normalizes both for distributional differences between traits (e.g. the depression facet appears to have much more of a spread than neuroticism), and general stylistic differences between dialog written for film vs novels (e.g. characters on screen tend to be more extraverted).
Only four characters are reported for each trait. This serves two purposes. First, there is oftentimes both a lack of evidence-for and a lack of evidence-against a particular trait for a particular character. The estimated score in such cases is less reliable. Secondly, and perhaps more importantly, the results are surprisingly hard to digest in their full form. The rankings I report are in some sense the most interesting and legitimate insights contained in Watson’s raw data.
####Jackson vs. Tolkien
In comparing the film to the novels, it is tempting to simply compare for each character the raw personality scores produced in each setting. This is not without danger, however. There are systemic differences in style of writing between screenplays and novels. As noted above, characters on film may tend to score higher in extraversion than those on the page. Additionally, there exist fundamental differences in Tolkien’s style of writing as compared with that of Jackson et al. Many of these differences (e.g. the extraversion shift) should not be recorded as character changes, because audience members mentally compensate for a global shift of this sort.
The solution, as with the character rankings, is to measure personality facets as a ranking amongst characters. This has a normalizing effect: if Sam is the most cheerful character in the books and the second least cheerful character in the films, this indicates a shift in his cheerfulness as perceived by a reader/viewer.
I only report those personality traits whose ranking shifts significantly between page and screen for a given character. More specifically, let \( C \) be a character, let \( x_C^T\) be the character’s book rank for trait \( T \), and let \( y_C^T\) be the character’s film rank for trait \( T \). I report \( T \) as being more emphasized in the movies (red) if \( y_C^T - x_C^T > \theta \), and being more emphasized in the books (blue) if \( y_C^T - x_C^T < -\theta \), where \( \theta \) is a threshold parameter (set to 2 in this experiment). As with the character spectrum, the traits that are reported are further filtered by noise considerations and interest.