Now, for a little background on the State Meet itself. In order to qualify for the State Meet, teams must finish as a top 5 team from one of 5 sectionals in the state. The sectionals are held in different regions of the state: South, Central, West, North-Northwest, and the Chicago/Suburbs. The State Meet is held every year at Detweiller Park in Peoria, Illinois.
The common thought is that the Southern sectional is often the weakest. I want to explore and examine if this is the case, and if there is any reason to change the current structure of how teams qualify for the field of 25 at the State Series.
The Data: The 2011,2012, and 2013 results from the IHSA Cross Country State Meet.
The Nitty Gritty: 25 teams with 7 runners a piece (for the most part). Athletes who qualified as individuals were removed from this study. I used the statistical programming software, R, to analyze this data.
For the most part, I want to keep this is as simple as possible by providing visualizations, and let you decide if change is needed.
Let's start out by looking at a boxplot, separated by each region and their performance at the state meet. The y-axis is the time in seconds that runners took to complete the 3 mile course.
This provides us with a bit of a base line and insight to what we need to look into further. We can see that the Central sectional appears to be pretty good, followed closely by North/Northwest, and the West Sectional. The other two sectionals appear to have slower times. It's a bit hard to tell by the size of the image, but the Central region also seems to be less spread out. This would indicate that the teams are better because in order to have a good team, your fourth and fifth runners must place high.
Now, let's take a look at the 1-5 runners and compare them from region to region. We will use a dotchart to examine the place of each runner at the state meet.
Already, this plot raises several red flags as to the separation between regions. First, we can see that the Central has the best "Number One" runners. The concern is that the Chi/Suburbs and the South have a large number of runners who failed to finish in the top 80 places of the state meet. We can see this trend continue when looking through the other runners:
The dominance of the Central Sectional can be seen throughout these plots, while the South does not appear to have the same quality as the rest of the field. Let's examine a few numbers to see just how big this gap is. Below are the team places from the last 3 years:
Central: 1, 1, 2, 3, 3, 3, 4, 4, 5, 6, 6, 8, 11, 11, 13
Chi/Suburbs: 5, 9, 10, 13, 14, 15, 17, 18, 20, 21, 22, 23, 23, 24, 24, 25
North/Northwest: 2, 4, 7, 9, 10, 12, 12, 12, 13, 14, 14, 15, 16, 16
South: 7, 16, 17, 18, 19, 19, 19, 20, 20, 22, 22, 23, 24, 25, 25
West: 1, 2, 5, 6, 7, 8, 8, 9, 10, 11, 15, 17, 18, 21, 21
The results above aren't shocking from what we've seen thus far, however, the gap is truly apparent in the team places. In the last 3 years, the Central region's lowest finish is 13th, while the South has finished in the top 15 only one time. In fact, the Central region has won 6 of the 9 trophies awarded at the meet. I've picked on the South sectional a lot, but the Chicago/Suburbs sectional has performed poorly as well. On average, we can see that this sectional produces one top 10 team per year, but the rest of the sectional performs similarly to the South Sectional.
Just how much faster is the Central compared to other regions? We can use a Tukey Test to test to see if there are significant differences between the groups and generate a confidence interval to see how much the differences are. Results are below. For those that aren't statistical minded, simply focus on the first two columns:
I feel that I need to break this down a bit for you. There are five columns and I will go into some detail on what these relate.
Column 1: This is telling us the regions that are being compared. In row 1, we see Chi/Suburbs - Central.
Column 2: This is the mean difference between the two groups. So the 56.27 you see in row 1 means that the average time for all of the runners from the Central region is about 56 seconds faster than the average time for all of the runners from the Chi/Suburbs region. A negative value indicates that the first region listed is the quicker region. Example: North/Northwest - Chi/Suburbs has a value of -30. This indicates that the group mean of North/Northwest minus the group mean of Chi/Suburbs is negative 30 seconds, which means the North/Northwest is 30 seconds faster.
----------------------------
For the Statistically Curious:
Columns 3 and 4: Similar to Column 2, however these are giving us bounds on what the difference is between the two regions. Since this data is only from 3 years, we are assuming this is a representative sample of ALL state meets. Obviously the structure has changed in recent years so this is not perfect. However, looking at row 1, the data tells us that the Central region is between 36 and 76 seconds faster than the Chi/Suburbs region.
Column 5: This is a p-value under the assumption that the two regions are equal. In statistics, commonly a value less than .05 means that the p-value is significant. Essentially, this means that if these two regions are equal, there is a 0 percent chance of obtaining the results. Doesn't make sense? I'll try to break it down.
Let's assume that Tiger Woods and I go to a golf course, and I assume that we are equal golfers. Obviously, we are not. We golf 18 holes. He shoots a 65 and I shoot a 110. Our null hypothesis is that we are equal golfers. Our p-value is the chance of the 65 and 110 occurring assuming we are equal. This percentage is going to be really small since two equal golfers would not end up shooting 45 strokes different in one round. Let's say that the chance of that happening (p-value) is .01. Since it is less than .05 we can say that there is a significant difference between the two golfers.
----------------------------
The results speak for themselves really. A runner from the Central region is on average about a full minute faster than a runner from the South and Chi/Suburbs sectionals. The West and North/Northwest sectionals are generally the same when comparing the means of each region, as they are about 25 seconds slower when comparing the average times of all of the runners from each Sectional.
The clear question is not if there is a gap between Sectionals, but if you are okay with the gap. In my opinion, the current system is flawed. I did not perform any "mathematically" sound statistical analysis on this section, but you can take a look at the Central and West Sectional results from year to year and see that numerous teams each year are being denied the opportunity to race at the State meet. In my eyes, some type of change is needed.
Possible Solution
Utilizing the same Sectional structure, the top 4 teams automatically qualify.
A committee will then decide the 5 final spots to comprise the field of 25.
A team may not be selected if a team that places ahead of them in the Sectional is not.
Example:
Team A finishes 8th
Team B finishes 7th
Team A may not be selected, unless Team B is selected as well. This would prevent teams from resting key runners at the Sectional.
I realize that some will argue that biases will occur, and teams will be unfairly left out of the State Meet. I completely agree with you on this. Each year, one or two teams will be feeling that they were "cheated" out of a spot. My argument to this is that under the current system, somewhere around 5 teams are being left out of the State meet that deserve a spot. I would rather have 1 or 2 teams subjectively left out, rather than 5 fairly, objectively left out.
I hope you enjoyed reading this in the dog days of summer training. If there are any other statistical questions you have related to high school running, please let me know.