How to detrmine cheating on the SAT statistically

Thread Starter

Raymond Genovese

Joined Mar 5, 2016
1,653
I read this news report today https://www.cnn.com/2019/01/02/us/florida-girl-sat-controversy/index.html

News reports frequently suck because they tell you so little about what happened and, instead, stoke your emotion. This is certainly no exception and now a high-powered lawyer has taken it on. Basically (from what I am reading and not from what I am otherwise knowing) A young lady took the SAT, got a 900. Subsequently she got a tutor, studied a lot more, took a prep course and the like. She took the SAT again and got a 1230.

The testing people are holding her results and are not "validating" them, at least so far (I put validation in quotes because they have their own definition of what validation means and they don't say what it is in the article). They are not saying that the increase is the reason, instead (again from the article) they are saying...

"We are writing to you because based on a preliminary review, there appears to be substantial evidence that your scores ... are invalid," it said. "Our preliminary concerns are based on substantial agreement between your answers on one or more scored sections of the test and those of other test takers. The anomalies noted above raise concerns about the validity of your scores."

This got me thinking about how one could statistically evaluate the likelihood of cheating by copying answers from others - at least that is what it sounds to me like what they are saying - or are they saying something else?

I think but don't know that answer keys (that is the test format) is not the same for everybody - right? I mean your neighbor is using a different form of the test so simply copying their answers would not work - right? If they are the same form that how would it be possible to know who copied from whom?

When I used multiple choice tests we always received statistics on the test results. Some of these were very useful to me in designing or redesigning test questions. One of my favorite measures was the point-biserial relationship between how well students did on the item and the total test. I would use this to see how well the question discriminated. IOW getting a high total test score should be related to getting the item correct - high total scorers having a high frequency of a correct answer and the opposite for low total scorers. Graphically a slope near 1 was a good discriminating item and a slope near 0 was not.

Are they going into further measures like complicated pattern evaluations making use of some of these measures or ??

I am posting to learn what others think, not about the equity of the decision or treatment of the student (I do have an immediate gut reaction but want to hold off for a while), but rather to generate discussion of possible ways one would approach such an evaluation. Personally, I doubt whether the preliminary lack of validation will hold up unless there is some easily understandable and documented incident that took place (and has not yet been mentioned and would open up another can of worms).

I am going to follow this if I can because I want to know what they looked at to arrive at the "preliminary conclusion or concern".
 

WBahn

Joined Mar 31, 2012
32,823
That's almost the exact wording used back in the 1980's to initially invalidate the AP Calculus test results for Jaime Escalante's calculus students (documented more-or-less accurately in Stand and Deliver).

I don't know the exact algorithm used -- I imagine there are many -- but essentially if there's a question that almost everyone gets correct but two students not only get wrong but put the same wrong answer, then that counts against them in this regard. If that happens enough times, then it raises a flag. Similarly, if they get the same questions correct that almost no one answers correctly, that adds to the "evidence" against them (although I think this is generally considered a weaker indicator).

I don't think these are looking so much for someone leaning over and copying answers from their neighbor's test -- the testing process attempts to make that sufficiently difficult. But rather students getting advance information about the question pool on the exam.

I don't know if this has every happened with the SAT, but one thing that fraternities used to be notorious for was building up files of questions asked by professors on exams. When the exam wasn't going to be returned what they would do would be assign a frat member to memorize a given question, so if you had ten people taking an exam that had five questions, two people would be told to memorize question number one and so on and then write the question down as soon as they walk out of the exam and bring it back to the frat house.

In Escalante's case, and possibly in this case, it was a matter of the fact that Escalante had taught this group of students using a very rote approach and there were some things that he didn't teach them particularly well and so they all made the same kinds of mistakes when they took the exam and that got flagged. I wouldn't be surprised to discover that an intensive prep course might result in the same kind of situation, particularly in terms of the specific content areas that did not get improved and thus would tend to stand out as unusual common weaknesses.
 

Thread Starter

Raymond Genovese

Joined Mar 5, 2016
1,653
Interesting stuff. I think if it pursued in court, they will have to reveal something about their analysis and I (and many other people I would think) would find that interesting.

As a side note....some years ago, a "kid" was doing a summer internship in a neighboring lab, had received a 1600 on the SAT (perfect score). He was not the first, but at the time it was a very rare event and he had a newspaper clipping to show people. He was right to be proud. He was a decent fellow as I recall, but I remember one time he was being very critical about his mother and had twice said that she was stupid. The second time he said she was stupid, I said "Doesn't that make you son of stupid?" It was the last time I heard him be so critical about either of his parents....
 
Top