Gathering data to make tests fairer
One promising trend in testing is less a new trend than the opportunity to pursue an old one with better data. All of us who have been involved in research and testing have lamented the difficulty of getting data on the kids who take our tests. That is different from the data that we collect in the course of taking the test itself.
We are constantly either planning or conducting what we call validity research, where we develop and analyze evidence that tells us how well our tests are working as either predictors of how kids will do a little further down the road or of estimating how they’re doing currently on things that are important but that we can’t test directly.
We want to make sure that our tests are fairly assessing all the various demographic subgroups. All test publishers use differential item functioning analysis to identify questions that may discriminate against one group or another, but it requires a lot of data. Comparing boys and girls is an obvious choice because the data set is so big. But we only get data on which test-takers are boys or girls on about half of the assessments delivered, and we get even less response on information such as who is an English Learner, who has a disability, or who is a member of a racial or ethnic subgroup.
It has historically been up to users to decide if they want to share those data with us or not. Most schools choose not to. In part, that’s because the information is not directly helpful to them. It doesn’t help them assess students, so they don’t see much use in entering it.
To get those data, we have to go to each school or district individually and try to talk them into providing it. But this is protected personal information and schools are, quite reasonably, not eager to let those data go willy-nilly. Of course, we don’t need data that would identify students—just the demographic information—but the concern from schools is understandable and appropriate.
As schools see more and more utility in sharing data through secure systems with appropriate privacy protections in place, those data are becoming increasingly available. I’m optimistic that our recent acquisition of Schoolzilla will help us refine our assessments because the schools using it have already seen some benefit in sharing those data, and it’s already designed to separate the information we need from anything identifying the student personally.
Everyone’s trying to crack this nut, so I’m looking forward to seeing the progress we’ll make on it at Renaissance and in the field more broadly.
The future of assessment
Schools typically test kids at the beginning of the year to screen who’s high, who’s low, and who ought to get special treatment, and then at the end of the year to determine who learned and who didn’t. More frequent but less time-consuming assessments throughout the year can help guide differentiation and instruction. In cases that require frequent progress-monitoring, our Star Assessments can be used monthly or even as often as weekly, although three or four assessments throughout the year should be enough to help teachers make decisions about individual students’ instruction. The trend toward using assessments to guide instruction is pretty well developed here at Renaissance, but it continues to grow around the country. I think that is the direction that things will increasingly go in the coming years.
Kids will be grouped, and students will be treated similarly within their group, but differently across groups, in an effort to bring everyone to the same point of competency. It probably won’t fully succeed in bringing every student to the point we’d like to see them—it has never worked before in all the ways that have been tried—but I think it’s a much more promising approach that we should pursue vigorously in the near future.
Another approach that I think we’re going to see more of in the long term is embedded assessments. These are tests that are folded into instruction so as to be indistinguishable. In theory, students won’t even know they’re being assessed, and the results should be available to inform instruction almost immediately.
This concept is new enough that it needs to be validated more. There will be some surprises (both happy and disappointing) as it’s developed and refined, but we’re likely to see a great deal of evolution on embedded assessments.
Increasingly, artificial intelligence (AI) applications are in development or used in educational settings, whether for assessment or instruction or a combination of the two. As with embedded assessments, the field still has a lot of shaking out to do, but it’s promising.
Again, I am optimistic while maintaining a healthy skepticism about changes like this. When new technologies come along, they need to be explored just as computer-adaptive testing was explored. CAT succeeded. A lot of other technologies have fallen by the wayside for one reason or another, and we’ll see that happen going forward again with other possibilities AI will open for us.
We need to be appropriately conscious of the fact that most exciting innovations will fall short for one reason or another. But that’s “old guy talk” that we don’t want to hear from the younger researchers exploring these new avenues. We need to encourage those innovators so they pursue the work that will uncover technologies and methodologies as effective as CAT.