Inside the assessment machine – The Life and Times of a test item
Bryan Maddox

Actor Network Theory (ANT) and Science and Technology Studies (STS), provide powerful insights into the increasingly global reach of literacy assessment projects, and their role in the production of educational policy through numbers.  These studies (see Grek 2007, Hamilton 2001, 2011) have largely adopted ‘outsider' ideological and methodological perspectives – to critique large-scale, standardised assessment and its role in governance by numbers and promotion of a neoliberal imaginary (Ball 2012).   But as Hamilton (2001) notes, ethnographies on the outside (based on public artefacts such as rarefied test items, and statistical and technical reports) provide only partial insights into practices of globalised literacy assessment projects.  As Latour and Woolgar (1986) observed, final reports and published documents may misrepresent the process of knowledge production.  Working from published sources, researchers find it hard to access and understand the ‘every-day' practices of assessment projects and instead have to ‘piece together' post-hoc accounts from limited sources (Hamilton 2011).   They have limited access to the technical, practical and political challenges that are central to technically demanding projects (ibid).
Since the efficacy and identities of assessment projects depend on ideologies of scientific/technical rigour and neutrality (Gorur 2011), i.e. their machine like qualities,  assessment projects are inevitably reluctant to open their ‘black boxes' to share intimate technical, methodological and ideological debates and challenges.  The risk therefore, is that ‘outsider' accounts produce stereotypical portrayals of assessment projects, with limited access into institutional practices, the agency and character of their human and non-human actors (including test items, software and algorithms).  To understand how such projects work it is necessary to get inside the machine and to understand how it works.  A study of the ‘laboratory life' (Latour and Woolgar 1986) of assessment means investigating in-situ not only the roles of various human actors, but also the place of documents and material ethnographic artefacts  - the non-human actors that that are integral to the practice of standardised assessment (see Riles 2006).

This of course implies a new approach to negotiating access and inter-disciplinary collaboration.  Why then, would large-scale literacy assessment projects want to invite critically minded ethnographers to study their work on the inside?  The answer to this question (that psychometricians might value and support inter-disciplinary collaboration) challenges us to re-evaluate prior assumptions and theoretical framing of assessment projects.   Drawing on ANT and STS (Latour 1987, 1996, 2004, Latour and Woolgar 1979), and the ‘new ethnographies of aid' (Sardan 2004, Mosse 2004, Mosse and Lewis 2006), the paper sketches the terrain for new ethnographies of assessment.  To illustrate this, the paper traces the life and times of a test item – its development and travels.  The ethnographic account is based on the UNESCO Literacy Assessment and Monitoring Project (LAMP). The test item example provides intimate ethnographic insights in the production of statistical knowledge - assessment events, the character of Item Response Theory (IRT), and the challenges of cross-cultural assessment.   

Literacy as numbers? Or literacy as alliances?

In a policy world that values numbers (‘policy as numbers' in Rizvi and Lingard 2010), scholars suggest there has been a shift in education policy towards governance by data (Hamilton 2013) which Lehmkuhl (2005) sees as governance by ranking and rating, Martens (2007) as  governance by comparison, and Grek (2009) as governance by numbers. It is within this picture that I enquire into government motivations for joining international literacy assessments (ILAs) which play a central and growing role in the ‘governance by data' (Hamilton 2013) trend, responding to the need to inform evidence-based policy with internationally comparable, statistical indicators. With a case study of LAMP (UNESCO's Literacy Assessment and Monitoring Programme) in Laos and Mongolia, I argue that countries join ILAs for reasons that go beyond the stated educational policy agenda and are not only about the numbers they produce.

From an Actor-Network Theory (ANT) perspective, and informed by the New Literacy Studies in my understanding of literacy, governments engaging in LAMP translate into an adapted identity and temporarily stabilize into a heterogeneous network of aligned allies and interests built upon black-boxed, temporarily accepted truths and unresolved problems. By furthering the overarching interests of the alliance, allies further their individual agenda – i.e. measuring literacy in a culturally and linguistically sensitive way for ‘better' data whilst an ally is statistically eliminating the ‘illiteracy problem' under the same alliance. Based on the theoretical and analytical resources of ANT applied to data gathered in interviews with key policy makers in Laos and Mongolia, I argue that countries join ILAs as a global alliance mechanism – a way of belonging and ‘putting themselves on the map' (Grek 2009) whilst risking ‘bad' numbers. ANT reveals insightful understandings of motivation in ILAs and the management of problematic statistical outcomes.

Is it feasible to develop a culturally-sensitive large-scale standardised assessment of literacy skills?
César Guadalupe

UNESCO´s Literacy Assessment and Monitoring Programme (LAMP) started as an attempt to "replicate" the International Adult Literacy Survey –IALS- experience in so-called developing countries; however, the LAMP approach gradually transformed in the light of criticisms, operational problems, changes in the relationship between UNESCO and its contractors, the constitution of a LAMP professional team, and the acknowledgment of the debates on IALS  that were not factored in the original design of the programme.

This paper intends to discuss LAMP's efforts to develop a cross-national assessment of literacy skills that, while sustaining a comparability goal, pays due respect to the cultural, linguistic and institutional diversity across and within the countries engaged in the programme. The paper will discuss this topic by addressing three issues: (i) translation and adaptation principles used for developing the LAMP instruments, (ii) the debates on standardisation and contextualisation during the item development phase and its consequences on LAMP, and (iii) rules and procedures for field operations.

Finally, the paper will attempt to situate LAMP in the overall context of the current debates on improving the measurement of literacy, how they are extensively affected by the ideology, practice and politics of assessment efforts conducted in the West (mostly originated in the US National Adult Literacy Survey), and how some competing discourses are framed around the educational assessment field . In so doing, the paper will provide some reflections on the politics of literacy measurement and the deep asymmetries of power affecting it.

How learning inequality is made to happen: Curricularizing the IALS and classifying adult literacy learners
Christine Pinsent-Johnson

For nearly two decades federal policymakers in Canada have overseen a project that involves curricularizing the International Adult Literacy Survey (IALS), creating a virtual literacy for policy interests, and not teaching and learning. In addition, recent reformulations of the IALS testing initiative are being used to classify learners and redirect literacy policy development in order to focus educational efforts on certain groups of adults over others. Based on these reformulations, work is currently underway to develop and widely market to policymakers a comprehensive IALS derived literacy learning system that includes instruction, assessment and program accountability elements.
Leading the IALS curricular and policy projects are consultants who were directly involved in developing and implementing the IALS. They and others have honed their expertise developing various IALS derived curricular products over the years, such as assessments and a ‘basic skills' curriculum framework, the Essential Skills. Although there are jurisdictional circumstances particular to Canada that help to account for why the IALS has made its way into adult literacy education, my paper will focus on describing how the IALS texts coordinate curriculum and policy development.

Using institutional ethnography and subsequently completed analyses of competency-based curriculum development and the IALS, I trace the use of boss texts in the construction of the IALS and its companion mechanism, the Essential Skills. Texts from Galton's social classification project and Item Response Theory are put to use in the IALS. Texts from competency-based curriculum are used to construct the Essential Skills. The integration of the IALS into the Essentials Skills curriculum framework facilitates its use in education.  The boss texts regulate the development of both mechanisms and in turn, regulate the development of subsequent curricular products and literacy education policy. The policy and curriculum development work is being done to actualize a literacy policy discourse concerned with global competitiveness and the education of workers for a ‘knowledge society'. Impacts of the efforts are playing out in Ontario as the provincial literacy program implements an IALS derived curricular-managerial system.
The Invention of Counting
David Vincent

The paper will consider the process by which the world's first centralised, quantified, consistent time series of literacy attainments was brought into being.  It will examine the project that was defined and launched by the early Registrars General and their staff as an unplanned by-product of the 1836 Registration of Births Deaths and Marriages Act.  The inclusion in the second annual report of a table on percentages of marks and signatures on the marriage registers came to constitute not only the prototype for the modern quantification of reading and writing, but also one of the first performance indicators of the investment of public expenditure.  The work of the founding statisticians will be placed in the context of the ‘avalanche of numbers that characterised the 1820s and 1830s.  The paper will focus on the methodological problems encountered by the early Registrars General (and their Scottish counterparts) and why it proved impossible to overcome key deficiencies in the compilation and interpretation of the tables.   Particular attention will be paid to the difficulties with the analysis of change over time and with the concept of literacy networks.  The failure to resolve these issues deeply influenced both contemporary and historical understandings of the dynamics of nineteenth-century literacy.   

Disentangling policy intentions, educational practice and the discourse of quantification: the collection and use of literacy attainment data in the 19th century era of "payment by results" and 21st century PISA
Gemma Moss

Critical discussion of the use of quantitative data in policy domains often focuses on issues in governance – how numerical data render the complexity of social interactions in one setting amenable to intervention and manipulation from afar in ways that act against the interests of those subject to this discipline. The use of statistical data in the formation of the nation state and the role performance data now play in processes of globalisation (Hacking, 1990; DesRosieres, 1998; Rizvi and Lingard, 2010) have become well-attested starting points for much subsequent discussion.

By contrast this paper focuses more closely on the collection and use of literacy attainment data in two contrasting historical periods, and the dilemmas and uncertainties that the data represent in educational policy and in educational practice. Education is here treated as a separate domain with its own institutional logic through which the use of numerical data threads its way. The two cases to be considered raise questions about how we generalise about quantitative practice and the uses to which numerical data can be put. By considering the representational status of the numbers as they are mobilised and displayed as distinct from their subsequent interpretation in policy and in practice, a more nuanced account of the role numerical data play in the formation of educational discourse is proposed.

The numbers and narratives of adult literacy assessment regimes
JD Carpentieri

Drawing on three recent adult literacy research and evaluation projects conducted for the English and Australian governments, this paper explores the ways that numbers increasingly shape adult literacy policy and practice, particularly with regard to measuring programme impact.

In England, the current prevailing narrative is of a national skills shortage (Keep, 2011). This narrative is justified by statistics purporting to show alarmingly low levels of adult literacy (Leitch, 2006); however, these numbers are far from objective (Sticht, 2004). Because of the "moral panic" (Coben, 2001) induced by these figures, government investment in adult literacy programmes has expanded hugely in the last decade. The same is true in Australia. To justify this funding, programmes must demonstrate quantitative evidence of impact – but how should this impact be measured? Until recently, England's adult literacy system has been driven by qualification targets. Policymakers are now considering a focus on a different set of numbers: "distance travelled". In the near future, provider funding may be based on the amount of literacy gained by adult learners. However, as this paper demonstrates, an overly zealous focus on quantitatively measured skills gains – dubbed the "tyranny of effect size" (Carpentieri et al, 2011) – is based on a dangerously simplistic narrative of literacy development, and a reductionist view of programme impacts. Furthermore, this approach runs the risk of hoisting policymakers by their own petard: those who justify funding through predictions of measurably large skills gains are likely to be disappointed (Sheehan-Holt and Smith, 2000; Reder, 2009).

This paper does not argue against measurement; rather, it offers an evidence-based argument for better, more suitable measurement, so that programme accountability is better aligned with programme effects.

Interpreting International Surveys of Adult Skills: Methodological and Policy viewpoints               Jeff Evans

In this paper, I locate international surveys of adult skills in their global context, in relation to policy agendas for lifelong learning, and to the developing ‘governance by numbers' discourse; the latter draws on Foucauldian perspectives that emphasize the effects of the availability of ideas and information on policy developments and social life generally (e.g. Ozga, 2009). In seeking to understand these surveys and their results, I will also emphasize how the interpretation of such studies needs to be related carefully to methodological decisions made in the course of carrying them out (cf. Radical Statistics Education Group, 1982). I will illustrate these ideas, by discussing the conceptualisation of adult numeracy and methods of its assessment in PIAAC (Project for the International Assessment of Adult Competencies), the most recent international survey. I will also aim to consider possible effects of its implementation, especially on ways of understanding adult numeracy (Evans, 2000) and ‘adult skills' more broadly, and on the ways that the results may be used to address questions of educational policy and development.

Beyond debunking: Towards a sociology of measurement
Radhika Gorur

Ask a policy maker today in any OECD country how their education system is doing, and they will likely respond in terms of PISA rankings, based on the three ‘literacies' it measures. The rest of ‘education' – the soft material of the ideological, sociological and cultural aspects of education – appears to have fallen off as the hard material – sometime referred to as the ‘hard evidence' – of literacy rankings have persisted, investing nations with new urgencies and priorities, drawing particular individuals or groups for policy attention, installing new regimes of administration, and creating new technologies of governance.

The growing takeover of the education policy space by numbers has not gone unnoticed by critics. Sociological critique has argued convincingly that (a): quantification cannot capture the complexity of education and is inherently reductive; (b) numbers are products of particular theoretical and methodological choices, and therefore not innocent, apolitical or objective; (c): numbers are a technology of governmentality and should be resisted; and (d) numbers are being misused in policy and should be viewed with suspicion.

Whilst all of these criticisms are legitimate, in this paper I argue that it is time to move beyond debunking numbers. Taking quantification to be performative rather than representational, I suggest a move towards a sociology of measurement in education which the traces the socio-material histories and lives of numbers, focusing both on the processes by which they translate the world, and the ways in which they make their way through the world. Understanding how the processes of delimiting, defining and standardizing systematically erase uncertainty and variation to produce apparently credible statistical measurements in education, and using concepts from such scholars as Helen Verran, Isabelle Stengers, Bruno Latour and Sheila Jasanoff, I explore how we might interfere productively in literacy measurements and their participation in policy.

Enumeration in the adult literacy regime
Richard Darville

Literacy workers and scholars in Canada have experienced over recent decades the remaking of literacy work through a dispersed but coordinated "regime" of discourses and textual technologies, including centrally practices of enumerating literacy. This paper sketches several themes from investigations of that remaking, and highlights their conceptual grounding in literacy practice theories (PT) and institutional ethnography (IE).

IALS, the transnationally coordinated centrepiece of enumeration, dovetails in its testing procedures and reports with various governance processes. It fills in a broader discourse of "human resources," by quantifying literacy thus construed, using a construct and criteria for literacy that hook into managerial discourses of "flexible work." IALS trans-local reports on literacy levels sometimes organize policy objectives as jurisdictional competitions between provinces and municipalities. The conceptual framing of IALS launches an ever-churning development of quantifiable, individuated, economically construed program accountability measures and even curricula.

Literacy workers experience reporting requirements aligned with enumerative discourses as ruptured from how they otherwise know the diverse, shape-shifting actualities of literacy learning. Experiences of rupture generate reform proposals for program-level accountability that better represents actual learner gains – especially in confidence and social connectedness. However, within an obdurate regime, critiques of enumeration don't stick, and accountability reform proposals aren't taken up. Reporting arrangements are held in place by their attachments into human resource quantifications and jurisdictional rate competitions, and by their parallel alignment with "management by outcomes."

A blend of PT and IE has enabled investigation of such processes. The recent turn within PT towards "global(izing)" literacies, and towards literacy artifacts as mobile/stable constituents of social relations, is particularly useful. IE – which orients to texts as always in action, coordinating activity – develops this textual turn. It enables unpacking of how the situated use of program accountability texts recontextualizes/localizes them, while at the same time suturing local settings into the regime. It enables explication of intertextual hierarchies in which texts and activities at one level fill in the "shells" of texts at higher levels: front-line assessment gives specific sense to accountability texts; the terms of accountability are designed to show policy mandates realized; policy objectives address the IALS rendering of literacy; IALS fits literacy into human resources discourse. This intertextuality dismally suggests that for any chance of reform, the experience and critique of rupture are not sufficient; the extended relations of the regime must be engaged.

OECD as a site of co-production: international comparative testing and the new politics of ‘policy mobilisation'
Sotiria Grek

Located in the field of the transnational governance of education, this paper examines the case of the OECD as a key expert organisation in the governing of European education. It builds on previous work (Grek 2009) which showed how the OECD became a major Europeanising actor, having not only entered the European education policy arena but in fact monopolising the attention and policy influence within it. This paper goes one step further; working with the specific case of international adult literacy testing, it examines how the OECD has become a dominant education policy actor as a result of its deliberate and systematic mobilisation by the European Commission, which found in it not only a great resource of data to govern (which it did not have before) but also a player who would be pushing the Commission's own policy agenda forward, albeit leaving the old subsidiarity rule intact.

In order to exemplify and contextualise this argument, the paper will discuss the main literacy studies which have metamorphosed the OECD  into a spectacle of surveillance and control for national education systems and have had tremendous effects on education policymaking not only on participant countries but on European education policy making overall. I will move on to explain and discuss the role of experts in this emergent European policy field and will finish off by an examination of ‘policy mobilisation'; applying theory from the field of social studies of science and technology, I will use the concepts of boundary work and ‘boundary organisation' (St Clair 2006; Jasanoff 2004; Guston 2000) to show the ways that the OECD has transformed into a ‘site of co-production' of both knowledge and social order (St Clair 2006).

Calculating literacy, disciplining people
Tannis Atkinson

This paper uses governmentality analytics to examine the statistical indicators of adult literacy developed by the Organization for Economic Cooperation and Development (OECD) and first employed in the 1994 International Adult Literacy Survey. I argue that using numerical operations to dissect and quantify interactions with texts—and to describe capacities of entire populations—constitutes a new way of knowing, understanding, constituting, and acting upon ‘adult literacy'.  Viewed as calculative practices (Davies & Bansel, 2007; Higgins & Larner, 2010) these statistical indicators produce a normative literacy (Payne, 2006) that constructs ideal subjects and justifies social exclusions (Hernandez-Zamora, 2010). Drawing on empirical data from one jurisdiction in an OECD member nation, I outline how adult literacy policies based on these calculations coerce and punish those who are poor or unemployed. I argue that, by disciplining those who are not ‘productive'—even those who are not working because of injury, discrimination or broad economic conditions—these policies construct lack of ‘literacy' as a threat to the population as a whole. Further, the authoritarian mechanisms employed in these policies oblige (Scott, 2005, p. 25) everyone to comply with the narrow OECD definition of what constitutes an acceptably active and productive citizen.