Mongo
Bending Unit
|
|
|
« on: 09-01-2010 22:58 »
« Last Edit on: 09-01-2010 23:03 »
|
|
There have been a lot of electrons spilled over the issue of how the current production run (6ACV01 and up) matches up with the original production run (1ACV01 to 4ACV18). This is impossible to definitively answer, since everybody has their own subjective opinion about each separate episode. However, I think that there are sufficient episodes in the new run to allow for at least a statistical comparison.
What I have done is enter every episode of the original 72 into a spreadsheet, with a rough personal rating for each one:
4 = outstanding episode 3 = very good episode 2 = typical episode 1 = poor episode 0 = terrible episode
The numbers themselves only matter in that they indicate the rating, they could have just as easily been 6, 7, 8, 9 and 10, without affecting the final result (since I am only using the numbers to compare between the original and new production runs). Obviously, my own personal ratings are unlikely to match those of anyone else, but on average they should be fairly close. In particular, I rated each of these episodes after having seen all 83 episodes so far of the new and old production runs (minus the movie episodes), so the new run is "calibrated" against the old run -- I was directly comparing the new episodes against the old ones.
Once I had all the episodes entered into the spreadsheet, I took a running average of the episode ratings, in blocks of 5 episodes. This is what I found.
For the 5-episode averages, the best 2 blocks of 5 consecutive (by production code) episodes of the original run had a score of 16 = 3.2/episode
3ACV01 Amazon Women in the Mood 3ACV02 Parasites Lost 3ACV03 A Tale of Two Santas 3ACV04 The Luck of the Fryish 3ACV05 The Birdbot of Ice-Catraz
4ACV08 Crimes of the Hot 4ACV09 Teenage Mutant Leela's Hurdles 4ACV10 The Why of Fry 4ACV11 Where No Fan Has Gone Before 4ACV12 The Sting
The were two blocks of 5 consecutive episodes that scored at 15 = 3.0/episode
1ACV12 When Aliens Attack 1ACV13 Fry and the Slurm Factory 2ACV01 I Second That Emotion 2ACV02 Brannigan Begin Again 2ACV03 A Head in the Polls
2ACV13 Bender Gets Made 2ACV14 Mother's Day 2ACV15 The Problem With Popplers 2ACV16 Anthology of Interest I 2ACV17 War is the H-Word
By comparison, the best 5-episode block of the new run also scored as a 15 = 3.0/episode
6ACV06 Lethal Inspection 6ACV07 The Late Philip J. Fry 6ACV08 That Darn Katz! 6ACV09 A Clockwork Origin 6ACV10 The Prisoner of Benda
The 2 worst 5-episode blocks from the original run scored at 6 = 1.2/episode
3ACV08 Thats Lobstertainment! 3ACV09 The Cyber House Rules 3ACV10 Where the Buggalo Roam 3ACV11 Insane in the Mainframe 3ACV12 The Route of All Evil
4ACV02 Leela's Homeworld 4ACV03 Love and Rocket 4ACV04 Less Than Hero 4ACV05 A Taste of Freedom 4ACV06 Bender Should Not Be Allowed on Television
The worst 5-episode block of the new run also scored at 6 = 1.2/episode
6ACV01 Rebirth 6ACV02 Inna-Gadda-Da-Leela 6ACV03 Attack of the Killer App 6ACV04 Proposition Infinity 6ACV05 The Duh-Vinci Code
It is unfortunate that the worst 5 consecutive episodes of the new run also happened to be the first 5 episodes we saw. It gave an impression that the new episodes were considerably less good than the average episode from the original run (which was true, they were -- but the following episodes were a lot better).
However, the most important conclusion I came to is that the broadcast episodes of the new production run are fully equivalent to the episodes of the original production run, with about the same average quality of, and amount of variation between, episodes.
|
|
|
|
|
|
|
|
|
KyleG
Poppler
|
|
|
« Reply #5 on: 09-03-2010 04:16 »
« Last Edit on: 09-03-2010 04:30 »
|
|
Mongo, might you post your list/rankings/ratings raw data? Your statistical analysis is interesting, but a far more quality and professional analysis could be done using the Mann-Whitney U test. This test is used to analyze two populations to see if they can be treated as the same population. Basically, here, one population is pre-cancellation eps. The other is revival eps. By providing each a rating, they can be ranked. Then an analysis can be run on the ranking to determine if the two populations are actually one population. In effect, whether old vs. new matters for quality of show. I could easily do itwith your raw data to work with (I guess episode number + rating, for each episode). I'm too lazy to rate them all myself
|
|
|
|
|
|
KyleG
Poppler
|
|
@Mongo Thanks. Here's my work, which concludes that there isn't evidence to suggest the pre/post cancellation distinction is meaningful for judging episode quality.
Null hypothesis: Pre- and post-cancellation episodes of Futurama are not statistically different in 0-4 rating quality based on Mongo's ratings. Alternative hypothesis: Pre- and post-cancellation episodes of Futurama are statistically different in rating.
Now, because this data is ordinal and non-parametric, we will use the Mann-Whitney U test.
n1=72 (size of pre-cancellation population) n2=11 (size of post-cancellation population) U=420.5 alpha=.05 (two-tailed)
The score/rating distributions in the two groups do not differ significantly.
Now, technically what the Mann-Whitney U test is telling us is that there is not enough evidence to suggest that the pre- and post-cancellation episodes are of different quality. The test is not telling us they are of the same quality.
But for our purposes, I think it's safe to say they're of pretty much the same quality. Based on Mongo's ratings, of course.
|
|
|
|
|
|
KyleG
Poppler
|
|
I'll clarify a bit more what the results are saying. Basically, we're "confirming" there is not enough evidence to show the pre/post cancellation episodes are "different." This is not the same thing as having enough evidence to show they are the same.
It's like how if we want to know how old you are. There is a huge difference between having enough evidence to know you are not 90 (meaning you could be <=89 years old, including 26) and having enough evidence to know you are 26 (meaning you are not 90, but are also absolutely 26).
Still, it's the best I know how to do here.
|
|
|
|
|
Veritas
Crustacean
|
|
I think it's entirely possible for the new episodes to be stylistically different rather than quantitatively different - they're both good, but in different ways.
|
|
|
|
|
speedracer
Bending Unit
|
|
|
« Reply #11 on: 09-04-2010 05:00 »
« Last Edit on: 09-04-2010 05:02 »
|
|
I played around with the Mann-Whitney test a little bit here.Punch in n_A = 72 and n_B = 12, input scores for each episode in the table, then crank it. If z > 2, then you can be pretty sure that there's a difference in the quality of the two sets of episodes (assuming that I understand this correctly). The sample size for the second set (12 episodes) is so small that it's really unlikely that anyone would ever be able to firmly say that there's a difference, though -- just playing around with some sample values, it looks like you'd have to think that 4 or 5 of the 12 new episodes are legit contenders for Worst Episode Evar in order to confidently say that the new run is worse.
|
|
|
|
|
coldangel
DOOP Secretary
|
|
I can help you out with this. I'm essentially a demigod, to the extent that I'm so far above the rest of humanity as to be basically a new species (homo superior), so my opinion on any matter can be taken as indisputable fact. The new episodes are easily as good as anything that's gone before. There. Discussion finished.
|
|
|
|
|
|
|
|
coldangel
DOOP Secretary
|
|
Oh, real mature tnuk. I bet you giggle when you hear the word titmouse.
|
|
|
|
|
|
|