Time to Toughen the SOLs?

Longwood professor Emily Williamson argued in 2012 that “SOL tests have helped Virginia to raise its standards but now it is time to raise them again.”

Prof. Williamson used accreditation to measure improvement.  In light of the wholesale and shifting manipulation of the accreditation process (e.g., the recent change that resulted in no Virginia school being denied accreditation), accreditation is a bogus measure. 

The SOL pass rates are less problematic and they give us a chance to test the hypothesis, especially as to whether raising standards would improve learning.  Let’s look at those data.

As of 2012, the average pass rates had, for the most part, risen consistently since 2006, the start date of the VDOE database (as accessible on the Web).


History and Social Science dropped in 2011 in response to a revision of the tests.  The smaller (but, on the statewide scale, non-trivial) 2011 decreases in the writing, reading, and math pass rates probably were the result of the General Assembly’s crackdown on abuse of the VGLA to artificially boost pass rates. 

Overall, however, the data were consistent with the Williamson hypothesis, at least as to earlier changes.  Then, in 2012 the Board of Education (taking the same view as Williamson?) installed new, tougher math tests; 2013 saw the introduction of new, more rigorous tests in English and Science.  Along with the earlier change in the history tests, this provided a chance to test the Williamson hypothesis going forward.


With each new test, the statewide pass rates fell, as would be expected.  The rates then bounced back.  Some of them. 

The math recovery from 2012 to 2015 might be explained by the Williamson hypothesis (although the huge drop in 2012 suggests that the earlier high pass rates didn’t represent a lot of learning of mathematics).  Reading, writing, and science enjoyed smaller recoveries.

These data do not tell us whether those pass rate increases reflect better learning or short term improvement from teaching better aligned with the new tests.  The year’s delay in improvement of the science, writing, and reading numbers (3 years for history) imply that there was a delay in adjusting the curricula and teaching to reflect the new tests.

The pass rate droops since 2016, however, falsify the notion of any long-term, test-driven improvement.  The Board of Education ran the test that Prof. Williamson suggested; the test failed.

Hmmm.  Let’s look at some division data to see if there is more to be learned regarding the Williamson hypothesis.

First, Richmond:


Looking back for a moment, the data give us a picture of the magnitude of its VGLA cheating.  Here is a summary of Richmond’s pass rate gains from 2006 to 2011 and the decreases from 2010 to 2011.


The History & SS drop can be attributed, at least in part, to the new tests.  The other 2011 losses average 52% of the earlier gains.  After we juxtapose those losses with the earlier, spectacular increases in VGLA testing in Richmond, it is hard to escape the conclusion that something like half of those pre-2011 pass rate increases were the result of cheating, not better learning. 

Then we have the performance after the arrival of the new tests.  Aside from the math and reading increases in the two or three years after introduction of the new tests, and the little science bump in the same period, the picture here is of declining performance, not of improvement. 

Thus, even if the Williamson notion of test-driven improvement were to hold in general, it does not predict Richmond’s performance.

For a nice contrast to Richmond, there is Hanover County:


Notice the scale: Same length as Richmond but shifted up ten points.

Without thrashing through the details, three things are clear here: It is hard to improve on a 90+% pass rate; there is no smoking gun in the 2011 numbers; and the new tests are not driving any kind of ongoing improvement.

Lynchburg shows a big 2011 drop in Hist. & SS, a huge 2012 math decrease, and little sign that the SOLs are driving recent improvement, aside from the delayed , short term bounces in the math, reading, and science numbers.  Indeed, the bounces all came in 2015 (even history), which suggests procrastination in aligning the curricula with the new tests.


The peer jurisdictions hark back to the Richmond data with only writing in Hampton to indicate that SOL testing might be driving recent improvement.




OK.  What about the Big Guy, Fairfax?


There’s that pattern again: Improvement until the arrival of the new tests.  Short term improvement after.  Then stagnation or declines.

We know that economically disadvantaged (“ED”) students perform differently on the SOL than their more affluent peers (“Not ED”).  Let’s see how the new tests affected the ED and Not ED groups.


As expected, the ED students underperform. 

It looks like much of the 2011 drop was in the ED group.  As to performance on the new tests, nothing new here.

There’s lots of room to argue about the reason(s) but little as to the conclusion: The data prior to the new tests support the Williamson hypothesis; the data after, falsify it, at least after the first couple of years. 

Thus, it looks like we’ve already enjoyed about all the test-driven improvement we’re going to get.  Now it’s time to figure out why the improvement is turning into decay.


Note added per a comment from the estimable Carol Wolf:

It is clear that performance, as measured by the SOLs, improved after imposition of the testing.  It is less clear whether the cause of the improvement was the standards set by the SOLs or the mere fact that we were measuring performance.

The traditional measures of education have been inputs: Teachers’ credentials, salaries, facilities, budgets, etc., etc.  But none of those tells us how much the kids have learned.  Although SOLs certainly are imperfect, they measure outputs and it may well be that the mere presence of the measuring instrument drove improvement.

Whatever was going on, it seems that the improvements have maxed out and perhaps even started to fade.  This is no reason to abandon the measurement of outputs.  Indeed, this is a powerful reason to refine the measurement (e.g., via the SGP that the Board of “Education” abandoned for no good reason), figure out what works to educate our children, and do more of it.