POLICY ANALYSIS: Recommendations
In order for contemporary assessment methods to capture the breadth of knowledge possessed by individual students as well as large groups of students, policymakers must take into account some form of authentic assessment. This would necessitate the use of alternative assessment strategies at scale, a number of which are currently implemented across a number of states, including Delaware, Idaho, Indiana, Kentucky, Maryland, Michigan, Missouri, North Carolina, Tennessee, New York, and California. Each state defines its policy differently, and employs a different type of authentic assessment. Methods employed by the aforementioned states include the evaluation of work samples, parental input, an individualized education plan (IEP) analysis, checklists (often developmental or behavioral in nature), student schedules, peer input, photographic or video documentation, letters written to the reviewers by students, work resumes, profiles of strength and abilities, report cards, performance event results, and student interview data or oral testing results (Warlick & Olsen, 2009). The combination of various types of authentic assessment comprise a student's portfolio, the results of which are used in conjunction with standardized test scores to make important individual decisions surrounding a student's academic career, from promotion to graduation, as well as whole grade levels of students for NCLB reporting.
The states’ complementary use of both traditional and authentic assessment strategies exemplify the fact that standardized testing is a cornerstone of American assessment, but that the two are not mutually exclusive and can and should be used together. Current policy across the nation can look to many of these states as models. Kentucky, considered a pioneer in the development of alternative assessment policy, mandated the use of performance-based assessment for all students in 1990 as a part of the Kentucky Education Reform Act (KERA) (Warlick & Olsen, 1999). This assessment system is aligned with academic expectations implemented in meaningful contexts, meaning assessment occurs within the learning environment rather than being far removed from the classroom, as is often the case with standardized testing. The goal is to move from summative to formative testing, or to include the later with the former in assessing student competency. Kentucky utilizes aggregate performance and standard assessment scores for individual as well as groups of students, reflecting a more in-depth look at knowledge and skills in a particular curriculum area (Johnson & Arnold, 2007). Testing spans curriculum areas, for both mainstream and special education students, to include reading, mathematics, science, social studies, writing, arts and humanities, and practical living/vocational skills.
Our Recommendation
Upon reflection of what is currently in place in these states, we suggest that all states begin implementing forms of authentic assessment that work to complement (and in some cases replace) their current use of standard testing. In line with the NCLB statute that requires testing at every grade, we suggest that portfolio assessment be integrated into testing all children, third through twelfth grade. It should also incorporate the NCLB stratification of scoring by proficiency levels, i.e., advanced, proficient, satisfactory, developing, and minimal, that are currently in place for typical writing and math assessments. This would ensure that aggregate scores of both types of assessment reflect policy that is currently in place. This testing should take place a number of times a year, again, moving from summative to formative testing so that assessment becomes part of the natural learning environment. This will benefit students by making clear, the academic standing at multiple points throughout the year. However, questions still exist about the viability of portfolios as national policy and whether or not these strategies can truly be implemented at scale across the fifty states.
Problems associated with the scalability of portfolio assessment are twofold: content and process (Johnson & Arnold, 2007, 29). That is to say, research is mixed as to whether or not portfolio criteria are sufficiently aligned to curriculum standards and whether or not the process of assessment adequately measures competency in a given content area. Ensuring that both tasks are met would involve consensus among three important bodies, policymakers at the federal level, state policymakers, and school level staff, namely teachers and administrators.
The federal government's role in addressing content should be upholding accountability. This would necessitate that the federal government set clear, but flexible standards for the content of what portfolios should assess, as many currently call for in the common standards movement. However, as we have learned from NCLB setting standards is not enough. The federal government must provide adequate funding so that states have the ability to meet those standards. Unlike the lack of support it has provided under NCLB, the federal role will be to provide financial incentives for adopting best practices such as those currently in place in Kentucky. State-level policymakers should use those standards as guidelines for creating content and process rubrics. That is, state legislators must concurrently take into consideration what the federal government believes every child should know and what his or her state believes is important for their children to learn in the classroom, i.e. state and specific cultural history. They should also address process in creating rubrics for how those content areas will be assessed, in that they should chose which elements of the six types of authentic assessment will define the portfolio. Lastly, school administrators and staff, will decide how these standards and rubrics will be implemented at the school level. This means they will need to align teaching with federal standards and state rubrics and decide which authentic assessment strategies best fit their student population in meeting the portfolio rubrics (i.e., oral testing or exhibition as satisfying the performance-based rubric). The vertical alignment of goals across the three bodies will ensure that accountability is met for students and their families.
A study by Warlick and Olsen (1991) provides examples of what this could look like from the nine states that currently conduct alternative assessments. In most cases, states have authority in setting portfolio content and methodology rubrics. For example, Delaware developed the Content Standards and Curricula Frameworks for English/Language Arts, Mathematics, Social Studies, and Science (1999). These frameworks define what every student in the state should know in each academic discipline. A second body, the Delaware Student Testing Program, is an accountability body that designs the methods of assessment, a combination of both traditional and authentic methods that includes multiple choice items, open response items, and performance tasks, and disseminates information regarding these methods to parents, students, and the public. The same is true of Kentucky, where content rubrics are developed by the KY Department of Education based on national standards (such as ADP, NCTE, NAEP, etc.) and where methodology recommendations come from the Kentucky’s NTAPPA, or National Technical Advisory Panel for Assessment and Accountability (Cindy Parker, personal communication, November 11, 2009). This was a trend across the rest of the eight states, in which one or two task forces had been created to oversee content areas and methodology. According to NCLB, these content areas must include English/Language Arts and Mathematics, but the majority of the states go beyond these two, often including Social Studies and Science in addition to the core. In terms of methodology, most states chose to include some form of sample work in addition to performance tasks (oral testing, problem solving, etc.) to encompass the portfolio. Therefore, we suggest that each state develop a policymaking body to oversee implementation (one for setting content standards and a second for setting methodology standards) or, if a body is already in place for traditional testing methods, that body begin to set standards for new authentic assessment and possibility look to existing policy-making structures as models for doing so.
States, however, cannot ensure successful implementation without school-level buy-in. When states set standards, districts, administrators, and teachers must create ways to meet those standards that fit their unique population of students and resources. Therefore, decision-making bodies at the school level must be created. We suggest that these decisions take place in teacher forums, in which teachers and administrators engage in inquiry about how to best meet federal and state standards. This would necessitate a reevaluation of the way in which professional development is currently employed. Just as policymakers concern themselves with content and process, so to should teaching training. Research shows that what gets tested gets taught (Johnson & Arnold, 2007), therefore teachers would need to address pedagogical changes that reflect what the federal government and states deem important to test and decide what their school-level conception of authentic assessment strategies would include. This would likely necessitate the use of sample work and performance tasks as the base of the portfolio, but could include supplementary methods such as video recording, student letters to the teacher about perceived process, or parental or peer input. Schools would need to decide how to augment state-mandated methodology in order to capture a more holistic picture of their students. Research shows that this type of teaching training has the added benefit of improving instruction, making pedagogy more student-centered, clarifying instructional program objectives, and equipping teachers to teach an increasingly diverse student body (NIREL, 1999).
Criticisms: Authentic Assessment
However, the ambiguity surrounding how exactly these methods should be evaluated and scored in order to ensure comparability and reliability across schools, districts, and states is a major criticism of authentic assessment. Some suggest that the subjective nature of scoring jeopardizes the scalability of authentic assessment because bias will likely cloud every score. Generally speaking, teachers would first need to participate in professional development scoring training that is aligned with both state-mandated scoring rubrics to ensure comparability across states as well as federally mandated standards that ensure national comparison of state cohorts. This would likely bring external district and state officials into the schools to conduct such training. In order to ensure reliability, however, portfolios would need to be scored by a number of different people and at different levels. In Kentucky, scores are determined at the school level using a double blind scoring method, applying criteria from an analytic rubric on each piece in the portfolio and then the scores for each piece are summed and averaged for both scorers to get an overall rating (Cindy Parker, personal communication, November 11, 2009). Scores report those scores to the KY Department of Education and a percentage each year from selected schools are audited, using the same process that schools use to determine scores. Auditors are usually ScATT (Scoring Accuracy and Assurance Team) members or trained scorers. Similar models are used across other states, such as Maryland, where portfolios are evaluated at a number of different levels to ensure inter-rater reliability (Warlick & Olsen, 1999). They are evaluated first at the school level by teachers, then all portfolios are evaluated by small multi-district scoring teams comprised of teachers from across districts, then a second team at the state level evaluates a randomly selection. We suggest that states look to these as examples in creating vertical teams of scorers and in deciding which teams will attack which scores. In terms of how the scores will be aggregated and reported, a federal standard will likely be necessary. Vermont, for example, employs a qualitative, written assessment, as well as a quantitative score, which is an average score on five subsections (NIREL, 1999). These would then need to be aggregated with standardized test scores by determining what percentage the portfolio will comprise of the larger score. In Kentucky, this percentage is 7.25 of a school’s total score (Cindy Parker, personal communication, November 11, 2009).
A second, and likely the most common, criticism of authentic assessment is that it is costly in the resources of both finance and time. The question is: do the costs associated with authentic assessment outweigh the benefit of having a much more accurate and comprehensive picture of student learning? Consideration of this equation must take into account the astounding amount of money the nation currently spends on standardized tests, approximately half a billion dollars annually (Perlstein, 2007). The idea that standardized testing is inexpensive and cost-effective is misleading, not only considering this statistic, but the fact that states often do not purchase one general test from testing companies (this is the cheapest way to buy a standardized test because it incurs a one-time fee), but more often than not, states buy costly tailor-made tests that, if the fees of which were taken into consideration, could greatly augment current data. It is certainly true that authentic assessment incurs a number of financial costs not associated with standardized testing (i.e., increased professional development for teachers, training for both teachers and district-level scorers, pay for those scoring specialists, etc.). It is also much more time-intensive. Computers scan tests and generate scores instantly, while multiple people must evaluate a portfolio both qualitatively as well as quantitatively, requiring great time and attention to detail. However, considering how little a computer can tell us about student learning, the money currently used on inaccurate standardized tests could certainly be put the better use. This would mean using part of that ineffective use of money to develop authentic assessment strategies that complement the use of standardized tests.
Moving Forward
Now is the time for the nation to begin to adopt these standards and assessment strategies. As the achievement gap persists and widens, the contribution of standardized tests that advantage white, middle-class students over students of color must be analyzed. The solution to this problem would necessitate using assessment strategies that are true to the student and his or her unique learning styles and authentic assessment is one of those strategies. However, its political viability is another question.
Timing in politics and policymaking, as public policy analyst John Kingdon argues, manifests itself in what he refers to as a “policy window” (Gilligan & Burgess, 2005). This window is the time in which policy issues become topic for debate in government and are eventually moved into legislation. He theorizes that this process involves three streams. The first is the stream of problems, by which issues become identifiable problems, a solution to which can be found in policymaking. The second is the stream of policies, or the availability of alternatives to deal with the problem. Lastly, the stream of politics is the nature of the political landscape and whether or not it is ripe for change. Each of the three streams operate independently and Kingdon theorizes that a policy window exists when the three, or sometimes at least two, streams meet and offer a space for policy action and implementation. How many of the three streams are necessary for moving authentic assessment into national policy and whether or not any of these streams have aligned is debatable. The policies and politics streams are definitely flowing; authentic assessment as an alternative or complement to traditional assessment is in place in number of states and President Obama and Education Secretary Arne Duncan have proclaimed the public educational system in crisis and have called for a number of significant changes and preconceptions. However, whether or not standardized testing is considered a problem is not as clear. The reality of the situation is that standardized testing is a profitable market for a few influential and powerful companies. Their ability to lobby for the continued use of these tests is the biggest obstacle in implementing authentic assessment policy.
So what will it take to push authentic assessment policy if a policy window is not in place? Kingdon theorizes that a policy entrepreneur could be instrumental to a cause, that is, someone who “expend[s] personal resources - time, energy, money - in pursuit of particular policy objective” (Gilligan & Burgess, 2005). Finding a policy entrepreneur, one who will base their political campaign on authentic assessment, may prove difficult, however, finding a state to represent such a cause may be a preferable route for implementing policy. What the nation needs is a state-by-state push for authentic assessment, so that other states may follow suit in implementing best practices. In an interview with Cindy Parker of the Office of Teaching and Learning at the Kentucky Department of Education, Parker sites Kentucky as the only state to make a long term commitment to portfolio assessment as a part of the state’s conception of accountability (personal communication, November 11, 2009). Beginning in 2011-2012, writing program reviews, including portfolios, will be a required part of state accountability for all schools and districts in Kentucky. It seems this may prove a viable model for the implementation of long-term, sustainable change at the state, and, in turn, hopefully, the national level.
Comments (0)
You don't have permission to comment on this page.