Action Area 6: Create More Responsive Learning Feedback Systems

Education, people often say, easily falls prey to purportedly cure-all fads, singular interventions which are supposed to have miraculous general effects. If one were to go through the fads of recent times, test-driven accountability stands out as perhaps the most widely promoted and tried educational solution.

Having reached new heights since the enactment of No Child Left Behind, the limitations of test-based accountability are now well documented.263 Ideally tests are no more than a few hours long and machine readable. The logistics of their form are such that they tend to measure discrete knowledge items distilled to clear-cut and isolable facts and aphorisms drawn from theories, and specifically items that can be adjudged right or wrong. These may not be the best things to be measuring in an era when the questions are at times complex and ambiguous, facts contestable, and theories open to interpretation. Testing what is readily testable has also produced a narrowing of the curriculum—to the receptive capacities of reading comprehension, for instance, more than the productive capacities of writing which are harder and more expensive to test (an irony in times when more people are writing and writing more). Today’s tests, moreover, mostly require memory work, when we live an era of at-your-fingertips mnemonics, where you can readily reach for empirical answers, definitions of concepts and representations of causality. They are individualistic when cognition is increasingly distributed and intelligence collective (learning organizations, knowledge management, communities of practice). They have a cognitivist ‘what’s-in-your-head’ bias when learning today is a matter of melding conceptualization with practical demonstration, analysis with application, experience with theorization. They represent an end-of-the-production line, quality control model of evaluation when they should more usefully provide constant, formative feedback oriented to continuous learner improvement. They represent a taxation model of accountability, where activity is measured in increments of evenly spaced time and outcomes can be reduced to a numerical bottom line—when, in fact, much of the most important learning occurs in shorter and longer timeframes and the outcomes are utilities, not numbers or grades. Indeed, the tests we have today, more than anything else, test for the tricks and tropes of tests, how well you can play this strange, other-worldly game of second-guessing the answers that will give you the best score.

Our testing and accountability system is in need of radical reform. Here are a few things we could work on:

We need to test more for learning, in addition to accountability. Testing should help learners and make them want to learn more, instead of serving primarily as a punitive/reward end-of-program measure. We have focused on summative assessment for accountability in recent years at the expense of formative assessment. We don’t do formative assessment much at all, and rarely in a way which is directly useful to and encouraging for learners.

We need assessment which provides day-to-day information, useful to and immediately useable by learners and teachers—feedback loops integrated into curriculum, curriculum designed to incorporate incremental assessment where there are no learning activities without feedback systematically built in.

We need to measure complex performances (a scientific experiment, making a short video, reporting on a community controversy) as readily as we measure things we think we can break into discrete, unambiguous facts and definitive theoretical aphorisms.

We need assessments which are ‘open book’, measuring not what you know but how you find out.

We need to ways to measure collaborative learning, not just of the whole group but providing multiple perspectives upon, and evidence of, differential contribution.

We need to abandon the sporting logic of results distribution in which there have to be losers in order for there to be winners, where necessarily, by fiat of statistical distribution, some must fail in order for others (relatively) to succeed. What if we were to set our objectives at universal success, or personalized achievement of customized learning outcomes?

We need to change the motivational structure of assessments, whether that be the anxieties and fears of failing for the learner, or the failure to meet average yearly test progress on the part of a school.

We need to move away from judgmental feedback (most peremptorily reduced to a grade) to constructive feedback on learning work, from the idea that intellectual works are ever finished to the idea that can always considered to be works-in-progress, that feedback can always be used for clarification, or improvement, or elaboration, or extension.

We need to provide digital portfolio spaces which show what a learner actually did. This can speak for itself to those who want or need to know, and much more powerfully at times than bald grades.

We need to create learning and assessment environments which in their core design provide multiple sources of assessment feedback—self, peer, teacher, critical friend, parent—with metamoderation loops and reviewer ratings that reward useful feedback by weighting raters. Today’s social networking technologies make this easily achievable—in fact, rating and commentary is an integral part of the new, participatory media. This adds sociability to assessment for learning, in contrast to the lonely isolation of the traditional test. It also creates an additional, lateral plane of evaluation rather than an unidirectional vertical one—the audience of one, the traditional teacher-judge. This also creates a sense that students are working in and for a knowledge and learning community.

We need to automate some aspects of feedback and assessment by developing as-yet-barely realized potentials of learners working in digital spaces, including natural language processing mechanisms, domain-defined semantic tagging, and machine learning algorithms which read in progressively more intelligent ways the noisy data of learning engagement.

We need a new psychometrics, capable of reading multiple sources of data on student learning—social and automated—and interpreting this across time (individual or cohort progress) and between demographically definable groups.

If we were to do any or all of these things, we would in fact be doing more testing than ever, but doing it better and more usefully. Our education would be more accountable and more transparent.

— Mary Kalantzis and Bill Cope

Action Items

Action Item 6.1: Create Better Tests

The standardized tests of today are by and large abysmally poor—measuring things which are mechanical and superficial. Their limitations become daily more glaring in comparison with the emerging needs of the knowledge society and economy. Simply stated, we need to invest in research and development, a Testing Things That Matter Program supporting research and development that leads to the creation of a new generation of tests addressing twenty-first century capacities.

Action Item 6.2: Integrate Assessment for Learning into Curriculum and Instruction

Nearly all testing resources take the form of end-of-production-line quality control or end-of-year taxation-style returns on educational investment. It is time to rebalance our testing agenda to give at an at least equivalent emphasis on testing for learning: diagnostic testing, formative testing, and testing that gives learners immediate, specific and useable feedback. The Assess-To-Learn Program would invest in the development of formative assessments, a glaring near-absence in the current testing industry.

Action Item 6.3: Develop Non-test Assessments

We need to supplement conventional tests with a balanced mix of other forms of assessment, including learner portfolios, self- and peer-assessment, group assessment, and task-based performance assessment. The Learning Outcomes Portfolio Initiative would support research and development programs in which the work a learner has actually done is a demonstration of learner capacities, not simply a numerical or pass/fail grade.

Action Item 6.4: Develop Assessment Technologies

We have barely begun to realize the potentials of the new technologies in assessment: as-you-go formative assessment, peer assessment via social networking technologies, and content assessment based on rubric construction and semantic web tagging. The Assess-As-You-Go Program would support the development of digital environment which incorporate human and machine feedback loops.

Action Item 6.5: Develop a More Comprehensive, Reliable and Diagnostically Useful Psychometrics

Data collection and analysis technologies are now available which would allow longitudinal tracking of individual student progress, and continuous, rigorous and consistent tracking on the basis of risk-group demographics. These need to be explored urgently, and for the same reasons as integrated medical records—the resource efficiency and effectiveness of educational investment. The Comprehensive Educational Records Program would support the development of multidimensional online student and cohort records.

Supporting Evidence

Supporting Evidence 6.1: A Short History of Testing

The United States has a long been a leader in educational assessment. Horace Mann called for standardized testing as early as the 1840s, and numerous American scholars drove the creation of educational standards and tests of both learning and innate intelligence264. In a society that aimed to foster a meritocracy, tests provided a way for the upwardly mobile to demonstrate their abilities independent of their race or social class. In concert with other efforts of the Progressives, the emergence of testing gave the field of education a level of scientific rigor and fostered the perception that schools could be managed through professional judgment rather than politics and ideology265.

In the latter half of the 20th century, the space race, the Coleman report, and other high profile issues raised concerns about inefficient bureaucracies and created pressure on schools to improve their results. Stagnant or declining educational outcomes and declining perceptions of public education in the 1970’s eventually culminated in the “A Nation at Risk” report in 1983. Following A Nation At Risk, political attention increasingly shifted from educational inputs to outputs, and standardized testing began to expand rapidly. Business leaders, many of whom were implementing high performance accountability systems in their own organizations, were particularly influential with state politicians in the late 1980’s and early 1990’s.266 These ideas found considerable traction in economically struggling Southern states, and governors of two of these states soon ascended to the White House, eventually leading to the implementation of No Child Left Behind in 2002.

The growth of standardized testing, particularly for accountability purposes, has largely been driven by politicians and central leadership. Not surprisingly, the current approach to testing largely serves the purposes of these actors. Rather than providing formative feedback to improve ongoing instruction, standardized testing under NCLB provides summative measures that are not available until students have already moved on to the next grade267. Although formative assessment has not received much attention among politicians, many schools and districts have implemented periodic benchmark testing in order to continuously inform their curriculum and instructional practices.

— Peter Weitzel

Supporting Evidence 6.2: No Child Left Behind and its Critics

To receive federal funding under No Child Left Behind, states are required to comply with the following: (a) have academic content standards, (b) administer standards-based assessments in reading and mathematics in grades 3 through 8, (c) employ a single statewide accountability system that measures and reports adequate yearly progress of all schools, (d) identify schools for improvement or corrective action, and (e) require teachers to be highly qualified in their subject area. Proponents of the program suggest that students are making progress, the achievement gap is closing, and that the accountability system has resulted in under-performing schools being closed.

At least 20 states and many school district protested the act; the National Education Association brought a lawsuit against the federal government; the Congressional Black Caucus has introduced a bill to place a moratorium on high-stakes testing, and the Harvard Civil Rights Project has warned that the law threatens to increase the drop-out rate for students of color.268 Educators have pointed out conflicts between the government and business when implementing NCLB, especially in relation to Reading First.269 Surveys and polls by several professional organizations have found that while teachers support the basic premises, they have identified the following problems: (a) results from a statewide high-stakes test are poor measures of school performance, (b) teaching to the tests is detrimental, (c) growth models that track students are better indicators than percentages of students who pass mandated tests, (d) emphases on reading and math to judge school performance has led to less emphasis on other subjects, (e) reporting disaggregated scores does not help improve schools, and (f) NCLB has resulted in lowering teacher morale.270

Research on NCLB has shown that changes in plans that were negotiated with states have resulted in eroding consistency. Accountability depends on which statistical techniques are used, which subgroups are included in the system, and how AYP is calculated.271 Some researchers have found that the narrowing of the achievement gap has disappeared in the wake of NCLB.272 Several unintended consequences have occurred including large numbers of students of color ending up with the least qualified teachers, an unrealistic standard of 100% of students being proficient by 2014, and the use of norm-referenced tests, which have been adopted by some states, defining 50% of the students as below average. Schools that serve English language learners and special needs students are penalized if they cannot meet targets; when English language learners become proficient they are transferred out of that group.273

The enactment of NCLB has exacerbated the differences in both learning and motivation among students who attend schools with differing resources.274 Narrowing of the curriculum has occurred as well; social studies, science and art, have been cut to devote more time to subjects that are tested, and teachers in low-income schools tend to spend less time on writing instruction since NCLB. 275 Teachers who are in struggling schools have decreased their efforts to assist students.276 Teachers’ tasks have increased in number and scope as their instructional roles have been increasingly controlled by expectations and assessment. Teachers’ pedagogical and personal relationships with students have suffered as they experience high levels of stress in the current policy environment.277 Increased use of scripted reading programs that often accompany district implementations of NCLB have resulted in limiting the curriculum and teachers’ abilities to address the needs of diverse students.278

While many problems have surfaced related to NCLB, several professional groups have attempted to make recommendations for improving NCLB. For example, the National Council of Teachers of English has recommended changes in the measures of adequate yearly progress, support for high quality teachers, English language learners, and support for 21st Century Literacies before reauthorization of the Elementary and Secondary Education Act.279

— Sarah J. McCarthey

Supporting Evidence 6.3: Improving Assessment

That all assessment can be aimed at teaching and learning has great appeal. To date, there are significant barriers to designing assessments that can be used for such wide-ranging purposes as accountability and improvement. Assessments used for accountability based on well-known, robust psychometrics are designed to be efficient. While there is wide agreement that multiple measures (e.g. short answer, essay) would improve large scale accountability assessment, there is tension between broadening what is assessed and increasing costs280. Incorporating universal design into assessment design is another proposed improvement. This idea is to develop assessments so that individuals will not need accommodations. For example, assessments would be developed without time constraints so no assessment accommodations (e.g., increased testing time) would be required281.

Multiple measures and universal design would be notable improvements to large scale assessments. Nevertheless to improve teaching and learning, new assessment designs and foundations are required including knowledge about the processes and the patterns which represent the way people think and act. These kinds of assessments are designed to structure situations which evoke evidence about students’ thinking and acting in terms of patterns282. The authors of the seminal report, Knowing What Students Know, commissioned by The National Research Council lay the groundwork for new theoretical and technical foundations for this kind of revitalized assessment designed to improve teaching and learning283.

Future assessment designs afford the opportunity for developing balanced assessment systems across classroom, district, state, national, and international levels. In addition, they could be aligned and mutually reinforcing284 . One argument is that a balanced and coherent system is based on a nested system of assessments exhibiting multiple features: 1) multiple measures, 2) alignment of standards, assessments, curriculum, and instruction within and across different levels of the system (school, district, and state); and 3) multiple assessments with timely reporting285. Early prototypes like The BEAR285 system and STARS system286 could be classified as multilevel assessment systems. Another report from U.K. also presents guidelines for building a multilevel assessment system287 .

The advantages of multi-level assessments systems can only be realized by incorporating information and communication technology (e.g. web browser, word processor, etc.) into assessment design to support the design of complex, dynamic, and interactive tasks. These tasks can expand the range of knowledge, skills, and cognitive processes that can be assessed. Large-scale assessments are exploring these possibilities to gain evidence of student content knowledge and reasoning although regulatory, economic, and logistic issues under NCLB present potential constraints. At the classroom level, technological systems, such as DIAGNOSER delivering continuous formative assessment and feedback and ASSISTments providing a comprehensive intelligent tutoring system, show promising results in validation studies285.

— Katherine E. Ryan

Supporting Evidence 6.4: The Value of Formative Assessment

In a landmark 1998 research review examining 250 articles, researchers concluded that formative assessment can improve student learning, particularly for lower achieving students, in a more effective way than other interventions ( effect sizes between 0.4 and 0.7)288 . Formative assessment is believed to help develop metacognitive skills and enhance motivation differentially for these students289 . Through formative assessments, learning sequences can be properly designed, instruction modified during the course of learning, and programs refined to be more effective in promoting student learning goals290 .

A theory of formative assessment based on cognitive and sociocultural learning provides the foundation for “assessment for learning” in addition to “assessment of learning”291 290 291. Within this kind of formative assessment framework that directly links assessment to how students learn, assessment forms and tasks are integrated with the kinds of learning to be achieved. Students are presented with a broad range of assessments forms (e.g., performance assessment, portfolios) and assessment tasks (e.g., constructed response, selected response), to show what they know and can do.

Recent pedagogical theory is aligned with this notion of formative assessment. That is, assessment provides an important cognitive and social function in teaching and learning; feedback is essential to enable students and teachers to understand their learning goals, compare the actual level of their performance to the desired level, and to engage in effective actions to reduce the gap292. Multiple assessment forms and tasks are essential to this kind of pedagogy. Instruction is adjusted to better meet students’ needs by using evidence about student learning.

There is nearly unequivocal agreement about the value of formative assessment based on cognitive and sociocultural learning theories and pedagogical theory. Nevertheless it has been notably difficulty to successfully initiate and implement in schools on a large scale basis. Time to develop, manage, and mark such assessments and teacher knowledge of these assessment practices are identified barriers. Technology is increasingly recognized as a critical means for developing and implementing this kind of formative assessment in schools. To date there are a few exploratory projects that illustrate the power of technology for re-imagining formative assessment in schools285. To achieve the promise of these prototypes, an large, well-theorized research and development program is required.

— Katherine E. Ryan

Supporting Evidence 6.5: Imagining a Time When the Exercise Book and the Test are Not Separate

The Assess-As-You-Go Writing Assistant is a project proposal of the College of Education at the University of Illinois. This will be an online writing environment which, via a combination of tagging, social networking, and natural language processing technologies, gives learners constant feedback on their writing in the form of on-demand, as-you-go formative assessment. The Writing Assistant will also track individual learner progress, the progress of cohorts, and the progress of individuals in relation to cohorts—thus providing summative assessment data which meets teacher, school, parent, and community accountability requirements at least as rigorously as—and we would hope potentially more rigorously than—today’s summative assessments.

Success in the proposed approach could have a revolutionary impact on the education system. Those conceiving this project can imagine a classroom in the not-too-distant future where all assessment is strongly, clearly, and reciprocally connected to learning; where teachers do not need to teach to the test; where distinct tests are an increasingly redundant mechanism for corroborating the outcomes of learning that have already been more systematically assessed; and where the distinction between assessment for instruction and assessment for accountability is no more than an analytical one.

The intervention would not require any adjustments to normal school schedules beyond what might normally be allowed for group or private writing time on a computer. Students would need access to computers connected to the Internet. This could occur in class time if students had laptops or were working in a computer lab, or even if the classroom had a small number of computers that could be shared so several students could be writing at any particular time. Alternatively, the writing could occur out of class time in the library or at home.

For the English/language arts class, for instance, the Writing Assistant might be used for any and all forms of writing, though the emphasis for the purposes of prototyping and feasibility analysis in this project will be informative and persuasive writing. The Writing Assistant could be used in any other subject areas as well; however, for this project, just one other discipline area will be selected for testing: informational and persuasive report writing in science—for instance, a research report which links facts and interpretation by synthesizing data and reviewing perspectives across a topic and recommends a course of action, a report of an experiment, or a report of a field study.

The process of outcome assessment would remain rigorous—potentially more rigorous, in fact, than what one can achieve even with the broadest battery of tests currently available to the education community. Such a classroom would be much more effective than current classrooms, improving both education outcomes and teacher satisfaction. We believe this would be a revolutionary step in education. Given our current, state-of-the-art computer science and pedagogical understandings, this revolution is for the first time achievable in the near future.

— Mary Kalantzis and Bill Cope