This is one way this process has been validated, from the submitted article: "The responses did indeed help predict which classes would have the most test-score improvement at the end of the year. In math, for example, the teachers rated most highly by students delivered the equivalent of about six more months of learning than teachers with the lowest ratings. (By comparison, teachers who get a master’s degree—one of the few ways to earn a pay raise in most schools —delivered about one more month of learning per year than teachers without one.)
. . . .
"The survey did not ask Do you like your teacher? Is your teacher nice? This wasn’t a popularity contest. The survey mostly asked questions about what students saw, day in and day out.
"Of the 36 items included in the Gates Foundation study, the five that most correlated with student learning were very straightforward:
1. Students in this class treat the teacher with respect.
2. My classmates behave the way my teacher wants them to.
3. Our class stays busy and doesn’t waste time.
4. In this class, we learn a lot almost every day.
5. In this class, we learn to correct our mistakes."
Here is earlier reporting (10 December 2010) from the New York Times about the same issue:
LAST EDIT: I'm amazed at how many of the comments in this thread appear to be about issues thoroughly discussed in the submitted article, but unresponsive to what the submitted article said. On this kind of issue, it's an especially good practice to read the fine article before assuming what is being discussed. We all know about school, but specific proposals for school reform have specific details that make some worse than others, and can be empirically tested.
>I'm amazed at how many of the comments in this thread appear to be about issues thoroughly discussed in the submitted article, but unresponsive to what the submitted article said.
I'm not, unfortunately. It seems like most people read a headline and perhaps a paragraph or two, then active their pre-existing beliefs about whatever the subject happens to me, and move on from there. That's certainly been my experience with commenters on my blog, anyway, and it's been experience in observing both online communities and in reading student papers.
Basing teachers' pay and job security on surveys from students seems like a good idea, especially given the numbers mentioned in the article. One problem is that it might give too much power to students.
I was a dick back in high school. The hacker I was back then would have figured out exactly how the testing and metrics were set up (public information) and organized a union of students to manipulate it. I can't do much with standarized test scores, they reflect on me. But a teacher quality survey? That's just a weapon.
Things like this make me wish that we had some kind of Hacker in Chief, to figure out how to circumvent new systems before they get implemented.
>One problem is that it might give too much power to students.
This is discussed extensively, midway through the article, in approximately four paragraphs:
Students were better than trained adult observers at evaluating teachers. This wasn’t because they were smarter but because they had months to form an opinion, as opposed to 30 minutes. And there were dozens of them, as opposed to a single principal. Even if one kid had a grudge against a teacher or just blew off the survey, his response alone couldn’t sway the average.
“There are some students, knuckleheads who will just mess the survey up and not take it seriously,” Ferguson says, “but they are very rare.” Students who don’t read the questions might give the same response to every item. But when Ferguson recently examined 199,000 surveys, he found that less than one-half of 1 percent of students did so in the first 10 questions. Kids, he believes, find the questions interesting, so they tend to pay attention. And the “right” answer is not always apparent, so even kids who want to skew the results would not necessarily know how to do it.
Even young children can evaluate their teachers with relative accuracy, to Kane’s surprise. In fact, the only thing that the researchers found to better predict a teacher’s test-score gains was … past test-score gains. But in addition to being loathed by teachers, those data are fickle. A teacher could be ranked as highly effective one year according to students’ test gains and as ineffective the next, partly because of changes in class makeup that have little to do with her own performance—say, getting assigned the school’s two biggest hooligans or meanest mean girls.
Student surveys, on the other hand, are far less volatile. Kids’ answers for a given teacher remained similar, Ferguson found, from class to class and from fall to spring. And more important, the questions led to revelations that test scores did not: Above and beyond academic skills, what was it really like to spend a year in this classroom? Did you work harder in this classroom than you did anywhere else? The answers to these questions matter to a student for years to come, long after she forgets the quadratic equation.
It would really depend across classes and groups. I've worked a lot with kids in academic (schools) and semi-academic settings (scientific summer camps), and I've had groups that had an awareness for "what was best for them".
For example, in one programming summer camp I led, I've had kids come to me at the end of the 10 days and tell me that I was their favorite educator and a great director because I was tough but fair and expected them to learn and progress, while some other educator was lame because even though he was super nice and laid back, they didn't learn anything with him.
Now of course, during the camp itself, I've overheard more than one kid telling another that I was super mean and terrible. But in the end, most of them were vocal about how they appreciated me.
Kids in general tend to be very aware of and appreciate fairness, consistency, and holding them up to standards and pushing them to get better.
Cases like that are not the exception, but sadly not the majority either (the specific example I cited was with French "gifted" kids between 14-18 years old). So it would really depend.
"But a teacher quality survey? That's just a weapon."
Only if it's used as part of a (simplistic) algorithm.
And besides, why would students want to fire the good teachers? That doesn't really make sense. (As someone who was sent to the principal's office on a semi-regular basis.)
It seems very similar to up- and downvotes. You'd want to know how to filter out the noise, but you're going to get a lot of good feedback from them. For instance, as much as I loved bad teachers in HS who let me get away with not doing anything, I never would've told anyone those teachers were "good" -- it was just that my immediate desire to slack off in high school outweighed my desire to do work on my own in spite of having a bad teacher who didn't require it. But grading that teacher as bad wouldn't take any special effort, so I would've done it.
Right. The effectiveness of student reviews needs to be studied in a context where the kids know the reviews are important. Might be a different result.
Modify the weight of a vote based on how far the student's votes deviate from the norm over time. A historically positive or negative student would have their vote neutered.
This doesn't prevent one class from ganging up to downvote a teacher on a single occasion, but if that happens that semester's score for the teacher will appear as an anomaly compared to previous scores.
I know more about student evaluations at the university level than at the grade levels discussed in this article. Here is an excellent overview of some of that research: http://home.sprynet.com/~owl1/sef.htm
The message one takes away from that is that (i) yes, student evaluations are a good predictor of some objective properties of a class (and other measures don't even achieve that much), but those properties aren't what teachers should be optimizing. I'd grant that (ii) it does seem worthwhile for students to see that their interests make a difference to what happens in the classroom. I'd also grant that (iii) some classroom situations may be so bad that optimizing student satisfaction may, even if not educationally ideal, still be a big improvement. And for all I know, this may be widely true at the pre-university level; but on the other hand, for all I know, giving these evaluations a big institutional role at the pre-university level could also be counter-productive...the evidence cited in the article hardly enables us to say. Any deliberation about giving student evaluations an institutionalized role should take the evidence behind (i) seriously.
One promising message from the research reported in the Atlantic article is that (iv) the specific tests being discussed have been designed in ways that seem novel and especially revealing. But the article mixed that together with an indiscriminate enthusiasm for student evaluations quite generally. And I think many people will read this and say, "Duh, that's a no brainer." Yes, it is a brainer! These kinds of policy issues aren't settleable from the armchair. Even if we cleared all the political hurdles and made someone the educational policy dictator, he or she isn't going to be able to tell just from the armchair what the results of rolling out one policy rather than another is going to be. So I get frustrated with articles like this one, that report some interesting evidence but mix it together with the kind of insensitivity to the details exhibited in comments like "That research had shown something remarkable: if you asked kids the right questions, they could identify, with uncanny accuracy, their most—and least—effective teachers. The point was so obvious, it was almost embarrassing."
Neither does this inspire confidence: "Some studies...have shown that professors inflate grades to get good reviews. So far, grades don’t seem to significantly influence responses to Ferguson’s survey: students who receive A’s rate teachers only about 10 percent higher than D students do, on average." I hope that readers of this site don't need an explanation of why the clause after the colon is only barely relevant to whether grades get inflated because that leads to better evaluations. It's almost irrelevant. In the first place, teachers needn't be aware of the cited fact; they may experience grade-inflation pressures differently. Also, the cited fact is compatible with the majority of current A-getters scoring their teachers in ways that are largely insensitive to the grades they get, but a minority of current A-getters and a majority of current B-getters being extremely responsive to the grades they get. The cited fact is just not what we need to know.
Expanding the final example: say I currently give 20 students As, 15 of them score me at 1.05, 5 of them are so happy to get As they score me at 1.65. I give my 80 other students Fs, and they score me at 1.0. Then on average my A students will score me at 1.2, and my F students will score me at 1.0, and my score for the whole class will be 1.04. But suppose I were to give 10 more F students As, and they would also be so happy that they'd also score me at 1.65. By doing so, I'd increase my average score from 1.04 to 1.105---for all we've been told, that may be several standard deviations across other teachers, and may translate to substantially higher salary and job security for me.
I am worried about adding another metric to the way that we measure performance in schools. It's already been shown that there are problems with the standardized tests. Not that they are terrible things, but students in the US tend to disagree with their proliferation. (Perhaps they would feel less this way if there were less tests)
Having students grade the teachers I think is also the same way. One of the problems is that students don't know what makes a good teacher. The tests can gear students in that direction (ie "I feel challenged but not overwhelmed in this classroom"), but if we look for too much insight from the students I think we will be misdirected. Just like we are misdirected when we pay too much attention to standardized tests.
My fear is that some schools would start to look at these performance measurements as golden bullets sort the way that we've started to look at standardized tests as golden bullets. Most of jr. high was geared towards getting perfect scores on the state exams. My Junior and Senior years of high school were almost 100% (a few teachers went outside of the scope, but it was a teacher decision and not an administrative decision) geared around AP tests and the ACT.
The ACT and the AP tests have their place. And I think that student evaluations of teachers have their place as well. Both are very useful when applied appropriately.
I just don't want to see the system (d)evolve in such a way that too much emphasis is placed on empirical data.
"One of the problems is that students don't know what makes a good teacher."
Students don't need to know what makes a good teacher. They only need to be able to assess whether they learned, and whether they had fun in the process. Whether they learned enough is pretty much an orthogonal issue, and one that can and will be dealt with through standardized testing. Students also don't really need to give any thought to a teacher's specific methods in order to offer useful information.
As with any self-sustaining system (politics, economy, ecology) you need a feedback loop to form among the parties with conflicting interests. If the feedback loop is broken, the system deteriorates and ultimately falls apart. If it's inefficient, the system tends to be inefficient as well. (Examples: soviet-style planned economy, rabbits in Australia etc.)
This kind of thing (teachers grade students, students grade teachers) could improve the efficiency of the feedback loop and thus efficiency of the education system as a whole.
University grading is seldom second-guessed with wide-scale standardized testing, which is the perfect vehicle for counteracting the grade inflation incentive. In primary schools, if the data shows your students love you but they keep flunking, you won't get a bonus, you'll get demoted to gym teacher.
Some kids would be excellent at grading teachers. Unfortunately some would just have a grudge, or just have a power trip and try to sabotage teachers. I think on average they would be a good reflection of how well-liked the teacher is, but maybe not how effective they were at teaching.
For example the tough math teacher would get a bad evaluation, but the cream-puff teacher who never assigned homework and gave everyone A's would get a good evaluation.
For example the tough math teacher would get a bad evaluation, but the cream-puff teacher who never assigned homework and gave everyone A's would get a good evaluation.
I always hear this objection, and it's usually brought up to discourage even experimental data gathering. I don't buy it at all, and I think it says more about the objectors' mentality than about the students'.
This survey was part of the larger study funded by Bill Gates. The claim of "prediction" is not supported by the evidence.. the correlations with test scores were low.. and in any case, this is another example of reifying test scores and stupid concepts such as "a months worth of learning" as if all learning is the same, regardless of the subject, the grade level, the prior experience of the kids and so on. The very low correlations with test scores are not surprising and a by-product of the survey questions, designed to check whether the "good" teacher gets kids to comply with rules, defer to the teacher's authority, think of learning as not making mistakes (and for the sake of Gates, stay on task all the time). Perfect conditioning for students being taught that education is a matter of doing well on fill-in-the bubble tests that Gates and this researcher seem to value as the single best measure of great education. See this and the links within it. http://voices.washingtonpost.com/answer-sheet/guest-bloggers...
If the student evaluations correlate strongly with the standardized tests' measures of student progress, then:
a) What's the purpose of having both?
b) If the evals are preferred over the tests, will "good" teachers continue to teach to a predictable, standardized curriculum?
c) Is the correlation additional evidence in favor of "differential compensation", that is, a compensation program based at least in part on exam scores?
d) Even if the information supplied is similar, doesn't this extra test/survey administration detract from instructional time? Is the information gleaned sufficient to compensate for the loss of instructional time?
e) Atlanta (Georgia, USA) is still reeling from a years-long cheating scandal. If such evaluations become "high stakes" (and there will likely be a push to do so, despite likely union opposition), won't these results be exploitable as well? (And perhaps even more so, through campaigning, social engineering, etc?)
What's the purpose of looking at multiple polls when you're trying to predict the outcome of the upcoming election? More evidence gives you higher confidence and lower margin of error. And as the article says, these student surveys provide clean, stable data that doesn't fluctuate very much from year to year and doesn't require much correction for race and family income.
These surveys take on the order of 10-15 minutes. That's nothing compared to a battery of standardized tests. They wouldn't have to be very informative at all in order to be worth the small sacrifice of instructional time, and if they're the second-best predictor of class achievement, then they're certainly worth the time (if the results are actually used).
I don't think there's going to be much movement to stop paying attention to standardized tests and curriculums, since these surveys don't measure the same thing - roughly speaking, the standardized tests seek to measure how much was learned, and these surveys add a dimension of why.
[+] [-] tokenadult|13 years ago|reply
. . . .
"The survey did not ask Do you like your teacher? Is your teacher nice? This wasn’t a popularity contest. The survey mostly asked questions about what students saw, day in and day out.
"Of the 36 items included in the Gates Foundation study, the five that most correlated with student learning were very straightforward:
1. Students in this class treat the teacher with respect.
2. My classmates behave the way my teacher wants them to.
3. Our class stays busy and doesn’t waste time.
4. In this class, we learn a lot almost every day.
5. In this class, we learn to correct our mistakes."
Here is earlier reporting (10 December 2010) from the New York Times about the same issue:
http://www.nytimes.com/2010/12/11/education/11education.html Here is the website of Ronald Ferguson's research project at Harvard:
http://tripodproject.wpengine.com/about/our-team/
And here are some links about the project from the National Center for Teacher Effectiveness:
http://www.gse.harvard.edu/ncte/news/NCTE_Conference_Using_S...
LAST EDIT: I'm amazed at how many of the comments in this thread appear to be about issues thoroughly discussed in the submitted article, but unresponsive to what the submitted article said. On this kind of issue, it's an especially good practice to read the fine article before assuming what is being discussed. We all know about school, but specific proposals for school reform have specific details that make some worse than others, and can be empirically tested.
[+] [-] jseliger|13 years ago|reply
I'm not, unfortunately. It seems like most people read a headline and perhaps a paragraph or two, then active their pre-existing beliefs about whatever the subject happens to me, and move on from there. That's certainly been my experience with commenters on my blog, anyway, and it's been experience in observing both online communities and in reading student papers.
[+] [-] hooande|13 years ago|reply
I was a dick back in high school. The hacker I was back then would have figured out exactly how the testing and metrics were set up (public information) and organized a union of students to manipulate it. I can't do much with standarized test scores, they reflect on me. But a teacher quality survey? That's just a weapon.
Things like this make me wish that we had some kind of Hacker in Chief, to figure out how to circumvent new systems before they get implemented.
[+] [-] jseliger|13 years ago|reply
This is discussed extensively, midway through the article, in approximately four paragraphs:
Students were better than trained adult observers at evaluating teachers. This wasn’t because they were smarter but because they had months to form an opinion, as opposed to 30 minutes. And there were dozens of them, as opposed to a single principal. Even if one kid had a grudge against a teacher or just blew off the survey, his response alone couldn’t sway the average.
“There are some students, knuckleheads who will just mess the survey up and not take it seriously,” Ferguson says, “but they are very rare.” Students who don’t read the questions might give the same response to every item. But when Ferguson recently examined 199,000 surveys, he found that less than one-half of 1 percent of students did so in the first 10 questions. Kids, he believes, find the questions interesting, so they tend to pay attention. And the “right” answer is not always apparent, so even kids who want to skew the results would not necessarily know how to do it.
Even young children can evaluate their teachers with relative accuracy, to Kane’s surprise. In fact, the only thing that the researchers found to better predict a teacher’s test-score gains was … past test-score gains. But in addition to being loathed by teachers, those data are fickle. A teacher could be ranked as highly effective one year according to students’ test gains and as ineffective the next, partly because of changes in class makeup that have little to do with her own performance—say, getting assigned the school’s two biggest hooligans or meanest mean girls.
Student surveys, on the other hand, are far less volatile. Kids’ answers for a given teacher remained similar, Ferguson found, from class to class and from fall to spring. And more important, the questions led to revelations that test scores did not: Above and beyond academic skills, what was it really like to spend a year in this classroom? Did you work harder in this classroom than you did anywhere else? The answers to these questions matter to a student for years to come, long after she forgets the quadratic equation.
[+] [-] GuiA|13 years ago|reply
For example, in one programming summer camp I led, I've had kids come to me at the end of the 10 days and tell me that I was their favorite educator and a great director because I was tough but fair and expected them to learn and progress, while some other educator was lame because even though he was super nice and laid back, they didn't learn anything with him.
Now of course, during the camp itself, I've overheard more than one kid telling another that I was super mean and terrible. But in the end, most of them were vocal about how they appreciated me. Kids in general tend to be very aware of and appreciate fairness, consistency, and holding them up to standards and pushing them to get better.
Cases like that are not the exception, but sadly not the majority either (the specific example I cited was with French "gifted" kids between 14-18 years old). So it would really depend.
[+] [-] Alex3917|13 years ago|reply
Only if it's used as part of a (simplistic) algorithm.
And besides, why would students want to fire the good teachers? That doesn't really make sense. (As someone who was sent to the principal's office on a semi-regular basis.)
[+] [-] DavidSJ|13 years ago|reply
[+] [-] majormajor|13 years ago|reply
[+] [-] shasta|13 years ago|reply
[+] [-] TheEzEzz|13 years ago|reply
[+] [-] dubiousjim|13 years ago|reply
The message one takes away from that is that (i) yes, student evaluations are a good predictor of some objective properties of a class (and other measures don't even achieve that much), but those properties aren't what teachers should be optimizing. I'd grant that (ii) it does seem worthwhile for students to see that their interests make a difference to what happens in the classroom. I'd also grant that (iii) some classroom situations may be so bad that optimizing student satisfaction may, even if not educationally ideal, still be a big improvement. And for all I know, this may be widely true at the pre-university level; but on the other hand, for all I know, giving these evaluations a big institutional role at the pre-university level could also be counter-productive...the evidence cited in the article hardly enables us to say. Any deliberation about giving student evaluations an institutionalized role should take the evidence behind (i) seriously.
One promising message from the research reported in the Atlantic article is that (iv) the specific tests being discussed have been designed in ways that seem novel and especially revealing. But the article mixed that together with an indiscriminate enthusiasm for student evaluations quite generally. And I think many people will read this and say, "Duh, that's a no brainer." Yes, it is a brainer! These kinds of policy issues aren't settleable from the armchair. Even if we cleared all the political hurdles and made someone the educational policy dictator, he or she isn't going to be able to tell just from the armchair what the results of rolling out one policy rather than another is going to be. So I get frustrated with articles like this one, that report some interesting evidence but mix it together with the kind of insensitivity to the details exhibited in comments like "That research had shown something remarkable: if you asked kids the right questions, they could identify, with uncanny accuracy, their most—and least—effective teachers. The point was so obvious, it was almost embarrassing."
Neither does this inspire confidence: "Some studies...have shown that professors inflate grades to get good reviews. So far, grades don’t seem to significantly influence responses to Ferguson’s survey: students who receive A’s rate teachers only about 10 percent higher than D students do, on average." I hope that readers of this site don't need an explanation of why the clause after the colon is only barely relevant to whether grades get inflated because that leads to better evaluations. It's almost irrelevant. In the first place, teachers needn't be aware of the cited fact; they may experience grade-inflation pressures differently. Also, the cited fact is compatible with the majority of current A-getters scoring their teachers in ways that are largely insensitive to the grades they get, but a minority of current A-getters and a majority of current B-getters being extremely responsive to the grades they get. The cited fact is just not what we need to know.
[+] [-] dubiousjim|13 years ago|reply
[+] [-] pdeuchler|13 years ago|reply
[+] [-] PaperclipTaken|13 years ago|reply
Having students grade the teachers I think is also the same way. One of the problems is that students don't know what makes a good teacher. The tests can gear students in that direction (ie "I feel challenged but not overwhelmed in this classroom"), but if we look for too much insight from the students I think we will be misdirected. Just like we are misdirected when we pay too much attention to standardized tests.
My fear is that some schools would start to look at these performance measurements as golden bullets sort the way that we've started to look at standardized tests as golden bullets. Most of jr. high was geared towards getting perfect scores on the state exams. My Junior and Senior years of high school were almost 100% (a few teachers went outside of the scope, but it was a teacher decision and not an administrative decision) geared around AP tests and the ACT.
The ACT and the AP tests have their place. And I think that student evaluations of teachers have their place as well. Both are very useful when applied appropriately.
I just don't want to see the system (d)evolve in such a way that too much emphasis is placed on empirical data.
[+] [-] wtallis|13 years ago|reply
Students don't need to know what makes a good teacher. They only need to be able to assess whether they learned, and whether they had fun in the process. Whether they learned enough is pretty much an orthogonal issue, and one that can and will be dealt with through standardized testing. Students also don't really need to give any thought to a teacher's specific methods in order to offer useful information.
[+] [-] rumcajz|13 years ago|reply
This kind of thing (teachers grade students, students grade teachers) could improve the efficiency of the feedback loop and thus efficiency of the education system as a whole.
[+] [-] philwelch|13 years ago|reply
[+] [-] Alex3917|13 years ago|reply
[+] [-] wtallis|13 years ago|reply
[+] [-] Evbn|13 years ago|reply
[+] [-] jakejake|13 years ago|reply
For example the tough math teacher would get a bad evaluation, but the cream-puff teacher who never assigned homework and gave everyone A's would get a good evaluation.
[+] [-] anigbrowl|13 years ago|reply
I always hear this objection, and it's usually brought up to discourage even experimental data gathering. I don't buy it at all, and I think it says more about the objectors' mentality than about the students'.
[+] [-] geoffschmidt|13 years ago|reply
[+] [-] jseliger|13 years ago|reply
[+] [-] tayl0r|13 years ago|reply
[+] [-] LH_Chapman|13 years ago|reply
[+] [-] alttag|13 years ago|reply
a) What's the purpose of having both?
b) If the evals are preferred over the tests, will "good" teachers continue to teach to a predictable, standardized curriculum?
c) Is the correlation additional evidence in favor of "differential compensation", that is, a compensation program based at least in part on exam scores?
d) Even if the information supplied is similar, doesn't this extra test/survey administration detract from instructional time? Is the information gleaned sufficient to compensate for the loss of instructional time?
e) Atlanta (Georgia, USA) is still reeling from a years-long cheating scandal. If such evaluations become "high stakes" (and there will likely be a push to do so, despite likely union opposition), won't these results be exploitable as well? (And perhaps even more so, through campaigning, social engineering, etc?)
[+] [-] wtallis|13 years ago|reply
What's the purpose of looking at multiple polls when you're trying to predict the outcome of the upcoming election? More evidence gives you higher confidence and lower margin of error. And as the article says, these student surveys provide clean, stable data that doesn't fluctuate very much from year to year and doesn't require much correction for race and family income.
These surveys take on the order of 10-15 minutes. That's nothing compared to a battery of standardized tests. They wouldn't have to be very informative at all in order to be worth the small sacrifice of instructional time, and if they're the second-best predictor of class achievement, then they're certainly worth the time (if the results are actually used).
I don't think there's going to be much movement to stop paying attention to standardized tests and curriculums, since these surveys don't measure the same thing - roughly speaking, the standardized tests seek to measure how much was learned, and these surveys add a dimension of why.
[+] [-] LH_Chapman|13 years ago|reply
[deleted]
[+] [-] camus|13 years ago|reply