re-mediating assessment: Some Things about Assessment that Badge Developers Might Find Helpful

Sunday, March 18, 2012

Some Things about Assessment that Badge Developers Might Find Helpful

Erin Knight, Director of Learning at the Mozilla Foundation, was kind enough to introduce me to Greg Wilson, the founder of the non-profit Software Carpentry. Mozilla is supporting their efforts to teach basic computer skills to scientists to help them manage their data and be more productive. Greg and I discussed the challenges and opportunities in assessing the impact of their hybrid mix of face-to-face workshops and online courses. More about that later.

Greg is as passionate about education as he is about programming. We discussed Audrey Watters’ recent tweet regarding “things every techie should know about education.” But the subject of “education” seemed too vast for me right now. Watching the debate unfold around the DML badges competition suggested something more modest and tentative. I have been trying to figure out how existing research literature on assessment, accountability, and validity is (and is not) relevant to the funded and unfunded badge development proposals. In particular I want to explore whether distinctions that are widely held in the assessment community can help show some of the concerns that people have raised about badges (nicely captured at David Theo Goldberg’s “Threading the Needle…” DML post). Greg’s inspiration resulted in six pages, which I managed to trim (!) back to the following with a focus on badges. (An abbreviated version is posted at the HASTAC blog)

A. There seem to be three types of primary goals for badge practices.

A review of the funded and unfunded proposals shows quite a range of goals for badges. Based on the information posted at the DML competition website, the primary goal of most of the badging proposals falls into one of three categories:

Use badges to show what somebody has done or might be able to do. This seems like the goal of badges in the Badgework for Vets and 4-H Robotics proposals.
Use badges to motivate more individuals to do or learn more. Badges in 3D Game Lab and BuzzMath proposals seem to accomplish this.
Use badges to transform or even create learning systems. This is what badges have accomplished in Stackoverflow and seems like the goal of badges in the MOUSE Wins! and Pathways for Lifelong Learning proposals.

These are not mutually exclusive categories. In many cases the second goal encompasses the first goal and the third goal encompasses the first and second goals.

B. These three types of goals appear to correspond with the three primary assessment functions.

Most of these goals require some form of assessment. Whether we like it or not, assessment is complex. Arguably these three goals correspond with three assessment functions (or what others have labeled as purposes):

Summative functions, which are often called assessment OF learning.
Formative functions for individuals, which are often called assessment FOR learning.
Transformative functions for systems, which a few are calling assessment AS learning.

C. Different assessment functions generally follow from different theories of knowing and learning, but these assumptions are often taken for granted. And the relationship between assumptions about learning and assessment practices are often in tension.

Summative functions generally follow from conventional associationist views of learning as building organized hierarchies of very specific associations. These concerns are generally consistent with the “folk psychology” views of learning as “more stuff.”
Formative functions follow from modern constructivist theories of learning as constructing conceptual schema by making sense of the world. These include socio-constructivist theories that emphasize the role of sociotechnical contexts in the way that individuals construct meaning.
Transformative functions follow from newer sociocultural theories of learning as participating in social and technological practices.

The key point here is that these three assessment functions often conflict with each other in complex ways. In particular, summative functions often undermine formative and transformative functions. This is because ratcheting up the stakes associated with summative functions (i.e., the value of the badge) often requires assessments that are “indirect” and “objective” like an achievement test. As John Frederiksen and Allan Collins pointed out back in 1989, such assessments have limited formative and transformative potential, compared to more direct and subjective performance and portfolio assessments. Appreciation of this point requires a foray into the complex realm of validity:

D. Each assessment function raises distinct validity concerns.

Getting serious about assessment means thinking about evidence, and that quickly raises the issue of the trustworthiness of the evidence. Validity has to do with the trustworthiness of the evidence for whatever claims one wishes to make about assessment; validity is not the same as reliability, which is a property of the assessment itself. Each of the three assessment functions raises different concerns about the validity of the corresponding evidence:

Summative functions raise concerns about evidential validity: How convincing is the evidence that this person has done or will be able to do what this badge represents? Many assessment theorists like Jim Popham break this down further into content-related, criterion-related, and construct-related evidence. Measurement theorists like Sam Messick break it down even more, but these distinctions are probably too nuanced for now.
Formative functions raise concerns about consequential validity: How convincing is the evidence that these badges led individuals to do or learn more? Consequential validity is often broken down into intended consequences (always desirable) and unintended consequences (usually undesirable).
Transformative functions raise concerns about systemic validity: How trustworthy is the evidence that this educational system might not exist if we had not used badges? Frederiksen and Collins pointed out that the systemic validity of an assessment practice is linked to its directness and subjectivity.

This is where having multiple goals for a single set of badges, or having different goals for different badges, in a single system can get exceedingly complicated. My point here is that badge developers should consider the various goals for their badges, and the assumptions behind those goals. Failing to do so can create “wicked” tensions that are impossible to resolve. This can be toxic to educational systems because stakeholders ascribe those tensions to other things (politics, laziness, culture, faddism, etc.).

In response to my first draft of this post, Greg summarized my point more succinctly and more generally:

People have different philosophical models of education (whether they realize it or not) and that is why they talk across each other so often.

Greg also inspired me to suggest the following contribution to Audry Watters' top ten list of questions you can ask to find out if somebody really knows education, and if you want to know if they know about educational assessment:

Do you understand the difference between summative, formative, and transformative functions of assessment and how they interact?