3.3. Data and algorithms in political campaigning

3.3.1. The uses and effects of data and algorithms in political campaigns

One area in politics where we find pervasive uses of data and algorithms are political campaigns. Campaigners, candidates, and campaign organizations use data to see the electorate, populations of interest, and their volunteers. They use metrics to monitor their progress and measure their success. They sometimes even use algorithms in order to identify who to talk to and how to best approach them. While data and quantification play an increasing role in international campaigns, their use and contribution in campaigns in the USA is most pronounced and merits specific attention if we want to understand the principles of their use and effects.

In the USA, electioneering looks back on a long tradition of quantification. Beginning with public opinion polling going back to the nineteen-thirties and -forties. But beyond regular surveys run by commercial companies in the service of campaigns and businesses, US government also extended the opportunities for state bureaucracies to be able to identify people, to learn about them, and to also provide politicians and their campaign organizations with the necessary information to support their election bids.

This long tradition in collecting data and making it available to campaigns is special to the USA and a decisive factor in enabling an empirically scientific approach to campaigning. By having access to state voter files. Campaign organizations, as well as academics, have access to a trove of valuable information about voters, allowing them to fine-tune campaigning methods and evaluate their effects on voter turnout. Crucial for this are two data points: the first is whether people turned out to vote in specific elections. The second is whether they registered as Republican, Democrat, or Independent in the voter file (a data point only made available in some states). These two data points are crucial for empirical or scientific approaches to campaigning.

Only through the objective documentation of someone turning out to vote is it possible to experimentally measure the effects of specific types of voter outreach. The availability of this data point makes it possible for US campaigns and academics to precisely calculate the effectiveness of specific interventions on voter turnout, while this is not possible for countries where this information is not available. This provides an explanation for the prominence of this approach among US academics and campaigners.

Having data available on the political affiliation of people is also a great benefit for US campaigners. This information allows them to target messages precisely at people likely to be responsive to them (at least in states that provide this information). Again, this is specific to the USA. But even there, the precise targeting of campaign interventions is a promising feature for campaigns exclusively in US states that make this data available to politicians or campaigns. In a study about the 2012 Obama campaign, the political scientist Eitan D. Hersh has shown that even US campaigns struggle to identify the party affiliation of people precisely enough in order for targeted interventions to work if they do not have access to this information in the voter files provided to them by the state.

We can clearly see that the state through regulation contributes to making people visible to politicians, campaign organizations, and academics. This allows political campaigns in the USA to learn a lot about potential voters, how to reach them, and how to evaluate and fine-tune their campaigning methods. In other countries, where the state feels less comfortable in providing this sort of visibility of the electorate to others, these approaches are accordingly much less developed and if deployed do not carry the same punch. Still, the image of data-driven campaigning as transformative force is still strong. This expectation builds on the perceived uses of data and targeting algorithms by the Presidential campaigns of Barack Obama in 2008 and 2012.

Both US-Presidential campaigns of Barack Obama in 2008 and 2012 are famous for their use of data for the coordination of volunteers and algorithm enabled identification of likely voters and donors. One reason for this clearly is the very active communication with the press by the campaign about their uses of data and algorithms for targeting. This has given rise to a very active subgenre in the campaign press coverage and campaigning literature. While much of this literature is likely to overplay the actual impact data-driven practices had on Obama's subsequent successes, these narratives have led to much attention being paid to both campaigns, allowing us a window into actual - or promised - uses of data and algorithms in campaigning.

Importantly, the Obama campaigns used data documenting volunteers and voters to make both visible and reachable. By making volunteers register their details on the campaign website, the e-mail list, or mobile app, the campaign collected a lot of information about invested - or at least interested - people, who were willing to interact with the campaign. This made these people visible to the campaign and reachable in follow-up communication, such as donation requests or the mobilization push. Already without running any algorithms this type of data and reach is highly valuable to any campaign.

Going further, the campaign used specific software for the coordination of volunteers during the campaign. Software solutions, like NationBuilder, hold information on volunteers and potential voters. Each local campaign unit can access a list with detailed contact information for voters the central campaign office has identified as promising and give them out to volunteers for contact, for example through calls or through doorstep visits. This increases the level of control for the campaign organization with regard to volunteers. By making promising voters visible and reachable to local campaign volunteers, the campaign empowers local organizations. At the same time, the central campaign can monitor the performance of local organizations and potentially intervene if campaigning efforts fall behind set goals. Again, data about voters and volunteers and dedicated software increase the options available to campaign organizations and the level of visibility and control by central units.

The campaign was also running algorithms based on the output of statistical models in order to identify promising contacts for the campaign. This includes the identification of voters who were likely to vote for Obama once contacted but also the identification of e-mail wordings for donation drives that were most likely to maximize donations.

But while we know that the campaigns used these techniques, we do not know their actual effects, especially when compared with other comparable mobilization or targeting techniques. So we know the campaigns believed enough in these methods to use and invest in them. But as outsiders, we have no possibility to quantify their effectiveness or their contribution to Obama's subsequent success. It pays to keep these limitations in mind when data or algorithms are featured as decisive factors of contemporary campaigning.

On that note, it is also important to acknowledge that the next successful US presidential campaign by TV personality Donald Trump in 2016 did not follow Obama's playbook. Instead of internalizing tasks of data collection, preparation, and analysis - as done by the Obama campaigns - the Trump campaign relied on the ad-teams of technology companies like Facebook, Google, or Twitter to provide the campaign with promising scenarios on how to best spend their money with either vendor. The campaign was therefor happy to outsource tasks connected with data collection, preparation, and analysis to outsiders and to buy their services in identifying the best approach for the campaign to spend its money. This being said, while Trump's approach to data and algorithms might differ from that of Obama, both campaign organization relied on their potential. They just did so following different paths.

This goes to show that uses, roles, and effects of data and algorithms in political campaigning are steadily evolving and a promising topic for research. The discussion of the following two studies shows different approaches of how we can go about learning about them through empirical research.

3.3.2. How to keep volunteers engaged?

The Obama campaigns extensively used data on their volunteers to keep in touch, collect donations, and to keep them engaged in the campaign and mobilize them into active support. This affordance of data and digital tools has remained somewhat undercovered in the discussion about the role of data for campaigns in favor of the more spectacular supposed powers of targeting appeals for specific audiences based on large data sets and advanced statistical methods. But connecting with volunteers and keeping them involved in the campaign is crucial not only for election campaigns but all sorts of civic and non-governmental campaigning. Examining these practices and effects more closely is therefor a promising subject for researchers in the social sciences well beyond those interested in electioneering. But - as always with campaign communication - it can be difficult to disentangle actual uses of practices and their effects from campaign promotion and journalistic enthusiasm for a good story. Still, doing so is crucial for scientists to get a better understanding of this practice. A recent study by Hahrie Han shows how this can be done [Han, 2016].

In her article "The Organizational Roots of Political Activism", Han asks if and how appeals from campaign organizations to volunteers can get them to "sign petitions, recruit others, and attend meetings" [Han, 2016], p. 296. She is especially interested in whether establishing a relational context for volunteers would allow campaign organizations to pursue these goals more successfully. Han examines this question by running three independent experiments.

In her first study, Han cooperated with an existing professional organization of doctors and medical students. She embedded an experiment within an e-mail campaign run by the organization attempting to mobilize its members to sign a petition. Han wanted to find out whether appeals that were personalized to people addressed by referring to statements about their own goals they had communicated the campaign prior led to higher petition signing than appeals that referred to general goals of the organization or appeals that tried to mobilize without referring to goals at all. Accordingly she prepared three different e-mail treatments that were sent out to 1,250 people each. Corresponding with her expectations, Han found that 11% of people how got the personalized e-mail signed the petition. 8.9% of people who got the e-mail referring to general goals did so. While of those people who got the e-mail without reference to goals only 3.7% signed the petition. The differences between all groups were significant. So, Han's first study shows that campaign organizations can mobilize people by creating a shared context through the emphasis of goals and could do so most successfully if they were able to refer to the personal goals of people addressed. But before friends of message targeting get too excited now, let's remember that the information used to personalize an appeal in this case was given by the subject to the organization before and not inferred based on Facebook likes, Twitter favs, or data from credit- or scratch-card companies.

In her second study, Han cooperated again with the organization of doctors and medical students, she had worked with before. This time, she was interested in whether creating a relational context for group members by referring to their shared membership status would have them more actively reach out and try to recruit new members. She again designed an e-mail experiment to test this assumption. She split a list of 118 names of newly recruited members of the organization into two groups. One group received an e-mail explicitly referring to the status of the addressee as a new member and them sharing this trait with 117 other new members before asking them to reach out to other doctors or medical students with the goal of recruiting them for the organization. The other half got an e-mail only referring to shared political and ideological goals before proceeding to the recruitment ask. The organization was able to track the subsequent recruitment outreach and found that among the first group - emphasis on a shared past or trait - significantly more addressees tried to recruit others (12.1%) than those receiving the standard message (3.5%). Again, this finding supports Han's expectation that establishing a shared relational context with volunteers created stronger involvement and compliance with asks by the campaign. Without going into further detail here, let it suffice to state that the third study reiterated this point.

While not directly about the role of data or algorithms in a campaign, Han [2016] provides an interesting study that shows the power of the active management of volunteers through campaign organizations - even through the rather impersonal medium of e-mail. The study also shows that campaigns can gain in effectiveness with their outreach, if they are able to create a greater sense of relational context between organization and volunteer by using information available to the campaign about addressed volunteers. Finally, the study provides a promising template for researchers interested in working with political groups as well as illustrating the power of experiments in identifying effects as well as potential mechanisms driving them.

3.3.3. How can we learn about the use of data an algorithms in campaigns?

Finally, let us have a look at an example of how we can learn about whether and in which ways campaign organizations use data or algorithms in their work. For this, we turn to a recent study by Nick Anstead, in which he documents the state of play in the UK during the General Election of 2015 [Anstead, 2017].

Anstead starts with the observation that the use of data-driven practices in campaigning is comparatively well-documented only for the USA, while its importance in other countries remains largely unknown. Trying to address this gap in knowledge, the author presents his analysis focusing on the uses of data-driven practices by parties in the UK during the General Election of 2015. Anstead sets out to

"(...) explore the perceived importance of data in contemporary British campaigns, to understand the data-based campaign techniques being used by UK parties, and to assess how data-driven practices are interacting with the preexisting institutional context of British politics."

[Anstead, 2017], p. 294.

To answer these questions Anstead draws on "thirty-one in-depth interviews with political practitioners involved in the use of data for six major UK parties and electoral regulators" [Anstead, 2017], p. 294. His study therefore provides an interesting template in going about how to find out the relative importance campaign professionals themselves and those tasked to regulate them put on data-driven practices, at least while talking about their campaigns in public, using a qualitative instead of a quantitative approach.

The author started out with the recruitment of interview partners. For this, Anstead tried to achieve broad coverage of UK parties and of people with different campaign functions and experiences, in order to account for potentially varying perspectives on the uses of data-driven practices:

"The sample generated gave us access to several distinctive spaces in UK electoral politics (...). Interviewees spanned the six major UK political parties (...). Between them, these parties range across the political spectrum, have different levels of support and resources, and diverse strategic objectives. In addition, interviewees included individuals who had worked nationally at senior levels on campaigns, as well as those who had worked locally in constituency campaigns."

[Anstead, 2017], p. 298.

The author transcribed the interviews and written notes of a practitioners' meeting at the London School of Economics and Political Science (LSE) and analyzed them using an iterative thematic approach. This allowed him to identify the most important themes through repeated reading of the transcripts.

First, Anstead discusses how practitioners themselves were assessing the effectiveness of their campaign. Crucially, practitioners pointed to metrics they had gathered during the campaign: "such as contact rates, the number of activists working for a party, and levels of support expressed through online donations" [Anstead, 2017], p. 301. This finding points to the issues connected with creating meaningful metrics for phenomena of interest discussed above. After the campaign, the author found that successful parties were connecting their success to efficiently targeting their messaging, while practitioners of parties with less success were arguing their effectiveness in the use of data-driven practices was unable to compensate for larger weaknesses of the campaign overall. This is an interesting finding as it shows the difficulty that even campaign professionals face in objectively assessing the role and effectiveness of data-driven practices. This reiterates the importance for quantitative measurement, ideally through experiments as those by Han [2016] discussed above, in assessing the actual impact of data driven practices instead of relying exclusively on consciously or unconsciously self-serving accounts of campaign professionals.

Going further, Anstead finds that UK parties varied greatly with their approach and access to data. Small parties were only able to bring limited resources to the task and accordingly were only using data rudimentarily with off-the-shelf software like Microsoft Excel. More established parties were found using various different data sources, like the electoral register, census data, commercial data sets, and dedicated surveys. In this, Anstead found the Conservative party to be the most active one and the party most closely aligned in their practices to what is known from US campaigns, trying to target information mailings to people's known preferences.

In discussing his results, Anstead finds that UK parties examined data-driven practices from the US but that the degree of their use interacted with the particularities of the UK electoral system with five explanations standing out to him determining their use:

"(...) the data regulatory framework, the electoral system, the psephology of contemporary elections, the geo-institutional form that campaigns now take, and the underlying cultures of political parties."

[Anstead, 2017], p. 305.

These findings show the contingencies of the successful adaption of data-driven practices from the US. Also Anstead manages to foreground that while many campaigners speak of data-driven practices, looking closely reveals that they are talking actually about very different practices. So, for any meaningful analysis, researchers need to dig deep, to reveal the actual practices instead of simply recording the shared use of the label data-driven practice. In the final account, the paper should put pause to anyone expecting data-driven practices to come to dominate political campaigns globally or even that the term covers a comparable set of practices. Instead, the context of campaigns and the associated opportunities and challenges campaigners see themselves confronted with matter also with regard to the use of data or algorithms.