Magic, evil, and alternate universes.

Sci-Fi Fantasy Visualization No. 1

This is a visualization of relationships between some of the biggest and most popular science fiction and fantasy universes: Star Wars, Star Trek, Doctor Who, Harry Potter, Chronicles of Narnia, Lord of the Rings, The Golden Compass, Superman, and X-Men. The spatial position shows the “similarity” between the universes based on nine variables, the size represents how familiar the people surveyed were with each universe, and the color categorizes the universes based on their distance to three “underlying variables” derived by k-means cluster analysis.

This project started out with me musing about how these different universes treat the issue of “good versus evil,” and ended up growing into a full-blown statistical exploration of the similarities and differences between these worlds. The graph above is one of the prettier ones, but I ended up generating over half a dozen different tables and graphs, which you can see in the article below.

If that sounds interesting to you, then keep reading: I will share my statistical journey with you in the sections that follow. Section 1 describes how I got the idea, section 2 describes data collection, section 3 describes the data analysis, and section 4 describes my conclusions.

Even if you don’t care that much, you should scroll down and look at the pretty graphs.  Some of them are quite cool.

 

Section 1: Introduction and Background

I was watching Harry Potter and the Goblet of Fire for the 97th time the other day, after having just finished watching Star Wars Episode III for the 75th time, and it occurred to me that one of the things that Harry Potter has in common with Star Wars is a very concrete and unambiguous distinction between Good and Evil. Sure, characters can cross over the boundary from one to the other, and back again: but there is never any gray area. You are never asked to wonder whether Voldemort’s vision of society might have its own internal logic, or whether the proponents of the Dark Side of the force have a valid point to make.

This kind of unambiguous good-versus-evil theme is also big in Lord of the Rings and the Chronicles of Narnia. These are epics that revolve around the struggle between incarnations of “good” and “evil” as absolutes.

This is very different from, for example, Star Trek or X-Men, where ethical dilemma and ethical relativism are basic engines that drive many of the plot lines. There are good guys and bad guys, but don’t the bad guys have a right to their point of view? Aren’t the good guys sometimes coerced into doing things that seem bad, in order to achieve a good end? Good and evil are central themes, but they exist side-by-side with a constant questioning of these boundaries.

Then there are some universes that are just less focused on good versus evil altogether. In both Doctor Who and Superman, for example, there are definitely “good guys” and “bad guys,” but the real driving force behind the conflicts are more often themes like human frailty and situational conflict. Even the “super Villains” of the universes, such as Lex Luthor or the Daleks, are not presented as absolute “moral archetypes”; instead, they are merely flawed products of their own history and circumstances.

Or at least, that’s how it seems to me.

And that got me thinking about other similarities and differences between these different universes in general. Some seem obviously very similar, and others very different. I thought it might be fun to try to visualize the “space” that contains these universes, and their positions within that space.

In order to do that, you need to have some kind of measurement. You have to have something to base numbers on, if your goal is to come up with some kind of graph. So my first task was to come up with a set of questions that represent features or characteristics of each universe that might be important factors in evaluating how similar or different they are from one another.

 

Section 2: Data Collection

I came up with 9 questions to capture some of the ways in which these Science Fiction and Fantasy universes might be either similar or different:

1 ) Is there magic, that is specifically presented as magic rather than science?

2 ) Is there “unknown science”, that is specifically presented as something technological or having a scientific explanation (despite it being not part of our real/current science or technology)?

3 ) Are there mythological creatures (in the sense of known, familiar mythological creatures, not things from outer space)?

4 ) Are there aliens from outer space (even if space travel isn’t in the plot, are there characters who are known to have originated in outer space)?

5 ) Is there space travel (even if it doesn’t happen as part of the “current plot line, are there are things in the story explained by implicit space travel)?

6 ) Are there parallel universes (this can include either actual movement between parallel universes as part of the plot, or implied alternate universes)?

7 ) Is there unambiguous good versus evil (without any question of which is which)?

8 ) Is there cultural or ethical relativism (i.e. are you ever asked to see “bad guys” or unfamiliar creatures or people from their own point of view)?

9 ) Is there a grounding in the present day or near future (in other words, does the story have any contact with a familiar, real-world earth setting as we know it now)?

Now, you might have come up with a different list. You might have added some things, removed others, or had something totally different. You should feel free to do your own analysis if you would like to; in fact, it would be interesting to compare the results.  But, this is the list that I came up with.

Of course, I have my own answers to these questions that I think are the right answers. But some people might disagree with my answers. So I thought the most fair thing to do would be to create a survey, and collect answers from a population of people. That way, I can use the percentage of people who answer “yes” to each question, about each universe, as a real empirical measurement.

I created a survey on Surveymonkey.com and got responses from 20 people. It’s not a random sample, of course: it’s a sample of people who follow me on Twitter or Facebook who care enough about Science Fiction and Fantasy to want to answer questions about it.  But as long as I can assume that the answers given by those people represent the beliefs and feelings of science fiction/fantasy fans “at large”, then a percentage of “yes” answers is meaningful.

Specifically, the percentage of “yes” answers represents how strongly people associate that characteristic (defined by the question) with that universe.

For example, if 100% of the people surveyed say that Chronicles of Narnia contains magic, then that means that there is a strong perceived association between Chronicles of Narnia and magic.  On the flip side, if 6% of the people surveyed say that Harry Potter contains aliens from outer space, then there is a weak association between Harry Potter and aliens.  Finally,  if 66% of the people surveyed say that X-Men contains unambiguous good-versus-evil, then that means that there is only a moderate association between X-Men and unambiguous good-versus-evil.

Now, there are some hardcore purists who will say: “What if the people surveyed are just wrong and don’t know what they are talking about?”

This is a valid question, at least on the face of it. For example, only 63% of the people surveyed said that The Golden Compass involved parallel universes, even though anyone listening to the introduction of the movie knows that the entire thing is presented as taking place in a parallel universe. On the surface, it seems like the correct answer should be 100% “yes” and that’s that.

What the 63% result tells us, though, is that when people are watching The Golden Compass, the parallel universe aspect of the plot simply isn’t as salient (for whatever reason). This is a perfectly valid psychological result of the survey.

Similarly, only 6% of the respondents said that Chronicles of Narnia involves aliens from outer space. Technically, in the movie The Dawn Treader, a character who is a star (i.e. an entity from space) comes down and talks to the main characters in the adventure. Does this count? Perhaps that’s what the 6% were thinking of when saying “yes.” But whether that “technically” is an example of aliens from outer space in the Chronicles of Narnia or not, the fact remains that aliens are not a well-known, salient feature of that universe.

So that is the way to interpret these measurements: What is the psychological association people have between the universe and the characteristic involved? If that is the question, then whether the answer “technically” should be “no” or “yes” is not the point.

In the actual survey, I also included “Not Sure” as a possible response.  That way, I got two tables of measurements as results:

 

“YES” as a percentage of “YES” or “NO” answers (excluding “NOT SURE”)

Percent Yes Answers

“NOT SURE” as a percentage of all answers
Percent Not Sure

The thing that jumps out about the second table is that people knew a lot less about Doctor Who and the Golden Compass than the other universes. Silly Americans. This may or may not have impacted other variables, but it’s something to keep in mind when looking those particular two points on any of the graphs below. It’s possible that they will have more leeway for error, because they are less well-known.

So, these are the raw data tables. Let’s look at some analyses and graphs.

 

Section 3: Data Analysis

The first and most fun simple statistic to look at is the pair-wise correlation between the different universes. Simply put: which universes are most similar to one another, which are most opposite from each other, and which are in-between?

Correlation Between Universes

The results are pretty much what you would expect. Lord of the Rings, Harry Potter, and Chronicles of Narnia are all clustered together as very similar.  The most opposite universes, on  the other hand, are Star Trek and Harry Potter. Of course, this isn’t surprising. One is a magical universe that takes place on earth in current day and is a struggle between Good and Evil, the other is a far-future outer space universe riddled with aliens and moral ambiguity.

Interestingly, although Star Wars and Star Trek are very similar, and Star Trek and Doctor Who are very similar, Star Wars and Doctor Who are not very similar.  This is a good illustration of one of the limitations of this simple “pair-wise correlation”: there is no transitivity.  You can have Star Wars and Star Trek being similar for one reason (say, for example, they both have aliens) and Star Trek and Doctor Who being similar for a different reason (say, for example, they both have ethical relativism as a theme), but that doesn’t necessarily mean there is any overlap between Star Wars and Doctor Who.

While we’re on the topic of variables, and the different ways that the universes can be similar or different, let’s look at the relationship between some of the variables themselves. Intuitively, it’s obvious that some of the 9 questions are closely related and others aren’t.  For example, one might imagine that universes that have magic are also more likely to have mythological creatures, or universes that have space aliens are more likely to have space travel.

In a way, the questions that I asked can be seen as just measurements that are designed to get at some kind of underlying characteristic of the universe.  When two questions are strongly related (e.g. magic and mythological creatures), they can be thought of as two ways of measuring the same underlying characteristic.

Statistically, one way to discover the underlying “reality” that my questions are measuring is to do a k-means cluster analysis on the variables. What this does is group the variables into conceptual “clusters”:  when multiple variables are clustered together, it means there is some kind of underlying property that they are all tapping into.  Because they are tapping into the same underlying property, the answers are often correlated: if the answer to one question in the cluster is “yes” then it means there’s a good chance the answers to the other questions in the same cluster are “yes” as well.

Variable Cluster Analysis

The cluster analysis identified three main clusters for the variables in our data.  This graph shows how closely associated each variable is with each of the three clusters.

Cluster 1 (red line) is closely affiliated with magic, mythological creatures, and unambiguous good an evil. It makes sense that these variables would “travel together,” as it were.  The underlying property being measured here is the classic magical good-versus-evil paradigm. The universes that are closest to this cluster-center are Chronicles of Narnia, Harry Potter, and Lord of the Rings. Let’s call this cluster the “fairytale archetype.”  It’s a classic magical morality tale.

Cluster 2 (blue line) is closely affiliated with unknown science, aliens from space, and space travel that takes place in the present day or near future. This cluster can be thought of as the classic human-oriented hard Science Fiction tale:  this is the Earth-based, technology-centered morally ambiguous science fiction. The universes closest to this cluster-center are Doctor Who and Superman. Let’s call this cluster the “cultural science-fiction archetype.” It’s the classic high-tech and aliens social commentary.

Cluster 3 (green line) appeared as a third category that I didn’t expect. It represents a kind of hybrid or cross-over between the other two: it is closely affiliated with unknown science and unambiguous good and evil, yet also has a pretty high score for cultural relativism. The underlying characteristic of this cluster is a third kind of tale that can be told: science fiction that is so far removed from the present day or known technology that it becomes “magical science fiction,” and retains some of the focus on good and evil that its “fantasy” counter-part has. The universes closest to this cluster-center are the Golden Compass, Star Wars and Star Trek. Let’s call this cluster the “space-opera archetype.” It’s the classic tale of humans trying to find life, love and meaning in a universe so technologically different from ours that it borders on magic.

Weirdly, when analyzed in terms of these clusters, X-Men didn’t end up being particularly close to any of them. In fact, the actual k-means cluster analysis ended up putting X-Men in its own cluster, far away from everything else, that was not very strongly related to any of the questions but weakly related to all of them.  I’m not sure what it means, but it’s kind of interesting.

Now I have to remind you that these clusters don’t actually tell you how similar or different each of the individual universes is.  This cluster analysis was done on the questions.  The point of this analysis is to identify some basic underlying archetypes or core conceptual items (“meme-clusters” if you like that terminology) that the 9 questions were giving us information about. We were able to discover three underlying concepts that can be used to evaluate these universes: how “fairy-tale-like” the universe is, how “science-fictiony” it is, and how “space-opera-y” it is.

To get a better feel for the similarity and differences between the actual universes (rather than the variables), we can use Principle Component Analysis.

Principle component analysis is a way of visualizing data using geometry. When we describe one of the science fiction or fantasy universes that we are looking at based on the % YES answers to 9 questions, we can think about that universe as a “point” in a 9-dimensional space: each question is a dimension, and the position of the universe along that dimension is the % of people who answered “YES” for that question.

But it’s really tough to visualize 9-dimensional space. Plus, these dimensions are not completely independent (or using geometry terms, they are not orthogonal).  As we have seen with the cluster analysis, some of the questions are more closely related to each other than others.

So what the Principle Component Analysis does is tries to find a way to project an image of the points in the 9-dimensional space (where each point is one of the “universes” we were asking about) down to something that we can handle: for example, a 2-dimensional space. That’s something we can actually draw a picture of an understand better.

To do this, we first a way of projecting the 9 “question dimensions” onto a 2-dimensional plane that will preserve as much information as possible.  This is where the heavy-hitting math comes into play. It’s like looking at your shadow when the sun is in different positions. You are a three-dimensional thing, and your shadow is two-dimensional: it is a projection from a higher-dimensional space onto a lower-dimensional space. That is also what Principle Component Analysis is doing.

When the sun is hitting you from the side, your shadow looks basically like the outline of a person; when the sun is coming straight down from above, your shadow is nothing but a small blob.  So how much information is “preserved” in the lower-dimensional projection depends on that angle of the projection: the relationship between the angle of the projection (sun) and the space that it is being projected onto (the ground).

Principle Component analysis is a mathematical way of finding the best angle to project our 9-dimensional “question space” down to 2 dimensions, which we can make a nice handy chart out of and visualize.

Got it? So once we run the numbers, the first result to look at is the “angle information”: that is, what is the relationship between each of our 9 “question dimensions” and the two dimensions on our chart.

PCA Dimensions

Above is a map of the “best projection” that it is possible to get of our 9-dimensional question space. Based on the responses the we got, 65.23% of the information is captured by the horizontal axis and 16.39% of the information is captured by the vertical axis.  Some information is still lost (the sum adds up to 81.62%), in much the same way that your shadow can’t even give you 100% of the information about what your full three-dimensional body is shaped like. But, it’s pretty good.

You can also see in this graph the relationships between some of the original 9 dimensions. As you might expect from a “geometric” interpretation of these questions, two questions that polar opposites of one another tend to be 180 degrees from each other: for example, Cultural Relativism vs. Unambiguous Good and Evil. Similarly, question that tend to address the same conceptual cluster (described by the earlier analysis) will tend to group together here as well: for example, Unknown Science, Space Aliens, and Space Travel.

What’s a little bit interesting is some of the relationships that you wouldn’t necessarily expect. For example, there is a moderate relationship between universes that invoke ethical relativism and universes that are grounded in the present day and involve parallel universes. Immediately, examples can be brought to mind for why this is only a moderate relationship: on the one hand, you have Star Trek and Doctor Who, which are very morally relativistic and invoke parallel universes; on the other hand, you have Chronicles of Narnia, which take place in a parallel universe but is more of an “absolute good-and-evil” fairy tale.

This ambiguous status of parallel universes could also be seen in the Cluster Analysis graph, above. As you can see there, the Parallel Universes question taps at least partially into all three of our conceptual cluster “archetypes”: fairy-tale, science-fiction, and space-opera.  Unlike some questions that are extremely high for some clusters and almost absent for others (e.g. “Magic” or “Space Travel”), parallel universes are a concept that can be invoked by any of these story-types.

So that’s neat.  Now that we’ve defined a mapping between the 9-dimensional space and our 2-dimensional projection, it’s time to plot some points!

PCA Science Fiction and Fantasy Universes

This gives us a visual depiction of the “multidimensional similarity” between these different universes, at least as they can best be shown on a two-dimensional graph.

Some people say that using Principle Component Analysis as just as much an art as a science.  The science part is the math: the number-crunching that produces this plot.  The art part is the interpretation.  We now have an X and a Y axis, that represent the “best” (mathematically) two dimensions for understanding the differences between these universes. But what do X and Y represent?

Just looking at the graph, some things come to mind.  For one thing, there is an obvious grouping along the Y-axis, with more science-fiction-y things at the top and more fantasy-ish things at the bottom.  There are very clearly two distinct groups here. So although all of the advocates of terms like “speculative fiction” are correct in saying that science fiction and fantasy are overlapping categories that exist on a continuum, we can also see from these data that big popular franchises tend to fall into one or two “groups” along that continuum.

The interpretation of the X-axis is a little more ambiguous. It’s tempting to say that this dimension is related to a “relativism / good-and-evil” dimension, with the more “absolute good-and-evil” examples of each genre appearing to the left of their respective group (e.g. Star Wars, Lord of the Rings) and the more “ethical conflict-based” examples of each genre appearing to the right (e.g. X-Men, Harry Potter).  But some of the “middle points” don’t really seem to conform to this interpretation, so I’m a little hesitant about it.

Of course, one of the problems with Principle Component Analysis is that there is absolutely nothing in the mathematics that says there needs to be a common-sense conceptual interpretation of the axes you get in your resulting graph. The computation is just finding the best mathematical fit: it’s possible that there is nothing that “makes sense” as a semantic interpretation at all.

Oh well, it’s still pretty.

Finally, to just add some flare to the thing, I took the above graph and added two more dimensions: the size of the blob represents how well-known the universe is, at least to the people who responded to my survey; the color groups the universes based on which of the clusters the universe was closest to in the cluster analysis. So now we are able to integrate all of our best results into one graph:

Sci-Fi Fantasy Visualization No. 1

Section 4: Conclusions

All in all, this was a fun little exploration.  Here are some of the conclusions that I think are worth highlighting:

1) Although everyone talks about “Good and Evil” versus “Ethical Relativism” as one of  the big distinguishing characteristics between Star Wars and Star Trek, that difference is relatively small when looking at a “bigger picture” of these two universes among other science fiction and fantasy universes. In the end, -Wars and -Trek still are pretty close together.

2) Well-known and common-sense archetypes like the “Fairytale Fantasy”, the “Cultural Sci Fi”, and the “Space Opera Sci-Fi” really do correspond to “meme clusters” that appear in the major popular speculative fiction universes (the cluster analysis).

3) The big popular speculative fiction universes really can be grouped based on a “Sci Fi versus Fantasy” axis (the Y-axis in the principle component analysis).

4) Star Trek and Harry Potter are complete opposites (the pair-wise correlation list).

5) Americans don’t know as much as they should about The Golden Compass and Doctor Who. Shame on you.