Thursday, July 24, 2014

Fixing the Gap between Sectionals

In my first post, I want to examine the perceptions of the gap between the class 1A Sectionals.  One of my passions is cross country.  In cross country, each team has 7 runners, and the place that you finish in the race is the number of points that you score for your team.  The team with the fewest number of points, wins.  Despite there being 7 runners, only 5 count towards the team total.  The data set that I will look at today is taken from the Illinois High School Association website (ihsa.org).  I will examine the last 3 years worth of data from the Class 1A State Meet.

Now, for a little background on the State Meet itself.  In order to qualify for the State Meet, teams must finish as a top 5 team from one of 5 sectionals in the state.  The sectionals are held in different regions of the state: South, Central, West, North-Northwest, and the Chicago/Suburbs.  The State Meet is held every year at Detweiller Park in Peoria, Illinois.

The common thought is that the Southern sectional is often the weakest.  I want to explore and examine if this is the case, and if there is any reason to change the current structure of how teams qualify for the field of 25 at the State Series.

The Data:  The 2011,2012, and 2013 results from the IHSA Cross Country State Meet.

The Nitty Gritty:  25 teams with 7 runners a piece (for the most part).  Athletes who qualified as individuals were removed from this study.  I used the statistical programming software, R, to analyze this data.

For the most part, I want to keep this is as simple as possible by providing visualizations, and let you decide if change is needed.

Let's start out by looking at a boxplot, separated by each region and their performance at the state meet.  The y-axis is the time in seconds that runners took to complete the 3 mile course.

This provides us with a bit of a base line and insight to what we need to look into further.  We can see that the Central sectional appears to be pretty good, followed closely by North/Northwest, and the West Sectional.  The other two sectionals appear to have slower times.  It's a bit hard to tell by the size of the image, but the Central region also seems to be less spread out.  This would indicate that the teams are better because in order to have a good team, your fourth and fifth runners must place high.

Now, let's take a look at the 1-5 runners and compare them from region to region.  We will use a dotchart to examine the place of each runner at the state meet.

Already, this plot raises several red flags as to the separation between regions.  First, we can see that the Central has the best "Number One" runners.  The concern is that the Chi/Suburbs and the South have a large number of runners who failed to finish in the top 80 places of the state meet.  We can see this trend continue when looking through the other runners:





The dominance of the Central Sectional can be seen throughout these plots, while the South does not appear to have the same quality as the rest of the field.  Let's examine a few numbers to see just how big this gap is. Below are the team places from the last 3 years:

Central: 1, 1, 2, 3, 3, 3,  4, 4, 5, 6, 6, 8, 11, 11, 13
Chi/Suburbs: 5, 9, 10, 13, 14, 15, 17, 18, 20, 21, 22, 23, 23, 24, 24, 25
North/Northwest: 2, 4, 7, 9, 10, 12, 12, 12, 13, 14, 14, 15, 16, 16
South: 7, 16, 17, 18, 19, 19, 19, 20, 20, 22, 22, 23, 24, 25, 25
West: 1, 2, 5, 6, 7, 8, 8, 9, 10, 11, 15, 17, 18, 21, 21


The results above aren't shocking from what we've seen thus far, however, the gap is truly apparent in the team places.  In the last 3 years, the Central region's lowest finish is 13th, while the South has finished in the top 15 only one time.  In fact, the Central region has won 6 of the 9 trophies awarded at the meet.  I've picked on the South sectional a lot, but the Chicago/Suburbs sectional has performed poorly as well.  On average, we can see that this sectional produces one top 10 team per year, but the rest of the sectional performs similarly to the South Sectional.

Just how much faster is the Central compared to other regions?  We can use a Tukey Test to test to see if there are significant differences between the groups and generate a confidence interval to see how much the differences are.  Results are below.  For those that aren't statistical minded, simply focus on the first two columns:




I feel that I need to break this down a bit for you.  There are five columns and I will go into some detail on what these relate.

Column 1: This is telling us the regions that are being compared.  In row 1, we see Chi/Suburbs - Central.

Column 2: This is the mean difference between the two groups.  So the 56.27 you see in row 1 means that the average time for all of the runners from the Central region is about 56 seconds faster than the average time for all of the runners from the Chi/Suburbs region.  A negative value indicates that the first region listed is the quicker region.  Example: North/Northwest - Chi/Suburbs has a value of -30.  This indicates that the group mean of North/Northwest minus the group mean of Chi/Suburbs is negative 30 seconds, which means the North/Northwest is 30 seconds faster.

----------------------------
For the Statistically Curious:

Columns 3 and 4: Similar to Column 2, however these are giving us bounds on what the difference is between the two regions.  Since this data is only from 3 years, we are assuming this is a representative sample of ALL state meets.  Obviously the structure has changed in recent years so this is not perfect.  However, looking at row 1, the data tells us that the Central region is between 36 and 76 seconds faster than the Chi/Suburbs region.

Column 5: This is a p-value under the assumption that the two regions are equal.  In statistics, commonly a value less than .05 means that the p-value is significant.  Essentially, this means that if these two regions are equal, there is a 0 percent chance of obtaining the results.  Doesn't make sense?  I'll try to break it down.

Let's assume that Tiger Woods and I go to a golf course, and I assume that we are equal golfers.  Obviously, we are not.  We golf 18 holes.  He shoots a 65 and I shoot a 110.  Our null hypothesis is that we are equal golfers.  Our p-value is the chance of the 65 and 110 occurring assuming we are equal.  This percentage is going to be really small since two equal golfers would not end up shooting 45 strokes different in one round.  Let's say that the chance of that happening (p-value) is .01.  Since it is less than .05 we can say that there is a significant difference between the two golfers.
----------------------------

The results speak for themselves really.  A runner from the Central region is on average about a  full minute faster than a runner from the South and Chi/Suburbs sectionals.  The West and North/Northwest sectionals are generally the same when comparing the means of each region, as they are about 25 seconds slower when comparing the average times of all of the runners from each Sectional.

The clear question is not if there is a gap between Sectionals, but if you are okay with the gap.  In my opinion, the current system is flawed.  I did not perform any "mathematically" sound statistical analysis on this section, but you can take a look at the Central and West Sectional results from year to year and see that numerous teams each year are being denied the opportunity to race at the State meet.  In my eyes, some type of change is needed.

Possible Solution
Utilizing the same Sectional structure, the top 4 teams automatically qualify.
A committee will then decide the 5 final spots to comprise the field of 25.
A team may not be selected if a team that places ahead of them in the Sectional is not.
Example:

Team A finishes 8th
Team B finishes 7th

Team A may not be selected, unless Team B is selected as well.  This would prevent teams from resting key runners at the Sectional.

I realize that some will argue that biases will occur, and teams will be unfairly left out of the State Meet.  I completely agree with you on this.  Each year, one or two teams will be feeling that they were "cheated" out of a spot.  My argument to this is that under the current system, somewhere around 5 teams are being left out of the State meet that deserve a spot. I would rather have 1 or 2 teams subjectively left out, rather than 5 fairly, objectively left out.



I hope you enjoyed reading this in the dog days of summer training.  If there are any other statistical questions you have related to high school running, please let me know.


6 comments:

  1. Very good. It may not do any good but I suggest you send this to IHSA. I have had probably four to five teams not make it out of central sectional that would have been top ten at state. Shelbyville, Robinson, Olney, Carlinville, St. Anthony, Cumberland and others all used to go south and got pulled up to Central sectional years ago. Sending them back would make a huge difference and something I have been advocating for years. Thanks again for nice read. Kevin Kramer-Shelbyville HS

    ReplyDelete
  2. Yep, I agree with you that the disparity between sectionals grew when there was a reconfiguration of zones you are referring to above, occurred. I think the system that I am proposing is something that helps to alleviate this issue. With the exception of the state meet, most schools don't want to have their students on a bus ride over an hour or 2 which is a perfectly good reason to have the sectional divided up by regions. However, the gap between sectionals in recent years has been large. This is a relatively small sample (3 years), but I don't think this is something that has varied much over time. Additionally, the system I am proposing is one that is flexible to allow the best teams in to the State meet, regardless of their geographic location.

    Thanks again for the comment Kevin. I truly appreciate it.

    ReplyDelete
  3. Nice article, this has been an issue for a long time. I ran varsity for Urbana Uni High in 2004 and 2005 and we finished 6th and 7th in the Central sectional those years, feeling both times like we were good enough to have qualified for State. 2004 was especially egregious as the top 4 teams from our sectional finished 1-2-3-6 (!) at state and even the 5th qualifying team finished 14th out of 25. Obviously we weren't going to win if we had qualified, but as our team goal for 2 years was just to get back to the state meet for the first time in several seasons, it was tough to stomach being left out due to an overly stacked sectional.

    ReplyDelete
  4. Nick, that is awesome. I hope you get college credit for that. Unfortunately the central sectionals in class A have been much faster than the rest of the state. In the 2011 Decatur sectional, the top 8 teams in that sectional would have all probably been in the top 10 at the state meet. I like the idea of doing something different. Great Work and thanks

    ReplyDelete