2.1. What is computational social science?

2.1.1. The promise of computational social science

Reading accounts of computational social science, one cannot help but feel excitement about their transformational potential for the social sciences.

In one of the foundational articles sketching the outlines and potentials of the emerging subfield of computational social science, David Lazer and colleagues wrote in 2009:

"(...) a computational social science is emerging that leverages the capacity to collect and analyze data with an unprecedented breadth and depth and scale. (...) These vast, emerging data sets on how people interact surely offer qualitatively new perspectives on collective human behavior (...)."

[Lazer, Pentland, Adamic, Aral, Barabási, Brewer, Christakis, Contractor, Fowler, Gutmann, Jebara, King, Macy, Roy, and Alstyne, 2009], p. 722.

More recently, Anastasia Buyalskaya and colleagues reiterate this early optimism:

"Social science is entering a golden age, marked by the confluence of explosive growth in new data and analytic methods, interdisciplinary approaches, and a recognition that these ingredients are necessary to solve the more challenging problems facing our world."

[Buyalskaya, Gallo, and Camerer, 2021], p. 1.

Marc Keuschnigg and colleagues expect that:

"(...) CSS has the potential to accomplish for sociology what the introduction of econometrics did for economics in the past half century, i.e., to provide the relevant analytical tools and data needed to rigorously address the core questions of the discipline. (...) The new CSS-related data sources and analytical tools provide an excellent fit with a sociological tradition interested primarily in the explanation of networked social systems and their dynamics."

[Keuschnigg, Lovsjö, and Hedström, 2018], p. 8.

These are just three examples but many accounts of computational social science voice similar optimism regarding the promise they expect CSS to hold for the study of societies and human behavior. These promises usually come in two forms: The first promise focuses on the increased coverage of social phenomena and human behavior through digital trace data and digital sensors, the second goes even further and expects a transformation of the nature of the social sciences.

On the most fundamental level, CSS can be understood as a response to the growing availability of data. The ever more intensive and diverse use of digital technology creates a constantly growing reservoir of data that documents individual behavior and social life in ever higher resolution. Lazer and colleagues describe this potential already in 2009:

"Each of these transactions leaves digital traces that can be compiled into comprehensive pictures of both individual and group behavior, with the potential to transform our understanding of our lives, organizations, and societies."

[Lazer, Pentland, Adamic, Aral, Barabási, Brewer, Christakis, Contractor, Fowler, Gutmann, Jebara, King, Macy, Roy, and Alstyne, 2009], p. 721.

Digital technology provides new types of data, offers new and broader approaches for the measurement of the world and social life through sensors and devices, and through digitization makes available and computable vast amounts of previously collected data that up until now could only be analyzed within the limits of their analogue form. The enthusiasm seems therefor more than justified.

First, any interaction of users with online services creates data traces, hence the term digital trace data. In principle, digital trace data provide a comprehensive account of user behavior with, and mediated by, digital services. This makes these data highly promising for social scientists, since they promise to provide a comprehensive account of those behavioral and social phenomena that happen on digital services or are mediated by them. Examples for phenomena like these are interaction patterns in political talk online, public interactions with news on digital media, or digital political activism. Additionally, behavioral and social phenomena not primarily associated with digital media but connected to them become also visible in digital trace data. Examples include the analysis of suspected trends in political polarization or extremism based on political talk online or interaction patterns between users on digital media, the mapping of cultural trends based on content on digital media, or discursive power in digital and traditional media.

But, careful! The operative words in the paragraph above are in principle. In practice, most digital traces remain out of reach of most researchers. While digital media companies have access to vast troves of digital traces emerging from the uses of their digital services, researchers have only access to highly limited slices of these data that companies choose to make available to them. This can either happen through dedicated programming interfaces, API, or through exclusive agreements between companies and select researchers for privileged access. This limits the realization of the promise of digital trace data in the social sciences considerably and raises severe practical and ethical concerns in the use of these data. We will come back to this later.

Second, digital technology also extends the number and reach of sensors measuring the world. This could be data emerging as a byproduct of another service, like satellite imagery, or the output of sensors specifically designed by researchers. In principle, this data type is only bound to increase with the availability and wide distribution of Internet of Things devices. Yet, this expected wealth of data reinforces important questions of people's privacy in a world of all-seeing, all-sensing digital devices and the legitimacy of data access for academics, researchers, and industry.

Finally, digital technology also provides new perspectives and opportunities for the work with data available in analogue form. By digitizing existing data sets, researchers can deploy new approaches and methods to already existing data sets. This promises new perspectives to old questions by making these analogue data sets available to analytical approaches provided by computation.

This massive increase in the number and diversity of available data sources extends the reach of social scientists. We can expect to cover more social phenomena and more of human behavior in greater detail and wider breadth. This can offer us a window to new questions and phenomena, as well as enabling us to examine well-known phenomena from a different vantage point. This might also allow social scientists to get a better systems-level view of society and human behavior. This has led some to expect computational social science to contribute to a transformation of the social sciences in general.

For some scholars, the availability of vast data sets documenting human behavior has inspired the hope that the social sciences might transcend their status of a "soft" science into an "actual" scientific discipline. In other words, a discipline with models allowing for the confident prediction of the future. In this view, more data do not only mean an increase of the coverage of social processes or human behavior but actually would allow for a "measurement revolution" [Watts, 2011] in the social sciences. Thus, social science might transcend its current state of after-the-fact explanation and evolve into a science with true predictive power. This hope rests on a view of society as being shaped by underlying context-independent laws that have mostly remained invisible to scientists due to the lack of opportunities to acquire data that can now be accessed. As with most ambitious dreams, the realization of the transformation of the social sciences seems far off.

We can find many studies that illustrate the first promise of computational social science. Increases in data documenting social phenomena and human behavior are significantly extending the tool-box available to social scientists. Here, CSS is proving to be a success and to become ever more important as access to data and knowledge about computational methods increase and diffuse among social scientists. The second, expectation (which you can either take as a promise or a thread depending on your faith-based affiliations) of a transformation of social science into a more strictly predictive science remains unfulfilled as of yet. While the faithful might be tempted to treat this as an indicator that we simply need even more data, it might be more plausible that the nature of the social sciences resists this sort of transformation. The subject of social science is the examination of context-dependent phenomena. This makes prediction in the social sciences an instrument of theory-testing and not an instrument of planning and design, as for example in engineering or physics. While CSS might increase the reach and grasp of social scientists, it does not necessarily make us into socio-physicists, nor is it a tragedy if it won't.

But what is computational social science, besides it providing social scientists with new data?

2.1.2. Computational social science: A definition

While it is true, that digitally induced data riches were a decisive factor in the establishment of computational social science, CSS is more than the computational analysis of digital data. Sure early work in CSS might have spend more time and enthusiasm in the counting of digital metrics and the charting of new data sets than strictly necessary. Also, this somewhat limited activity combined with the hardly contained exuberance of some early proponents of CSS might have given rise to the chararicature of CSS as a somewhat complicated effort at counting social media data. More generally, it is limiting to focus definitions of CSS on specific topical subfields. It is true that much early work in CSS focused on digital communication environments. But this is more an artefact of early availability and accessibility of data sets documenting user behavior on social media---especially Facebook and Twitter---than a constitutive feature of CSS. Instead, CSS is the scientific examination of society with digital data sets and computational methods. This can extend to the examination of digitally enabled phenomena but does not have to stop there.

For one, far more and more diverse data sets are now available than in the early days of computational social science, ten years ago. As a result, current research in CSS no longer works primarily with social media data, but instead uses far more diverse datasets. Examples include large text corpora documenting news reporting or literature, historical and current parliamentary speeches, as well as image or video data. At the same time, historical data records are increasingly being made digitally accessible and provide rich opportunities in the social sciences. Also, there is growing awareness among practitioners of computational social science for the need of providing stronger connections between CSS studies and social science theory. This holds for connections to established theories as well as the development of new theoretical accounts.

In order to characterize computational social science, exclusively data- and method-centric definitions of CSS are therefore too one-sided and consequently outdated. In 2021 Yannis Theocharis and I suggested a definition of CSS, taking current developments into account while also foregrounding what differentiates CSS from other approaches in the social sciences:

"We define computational social science as an interdisciplinary scientific field in which contributions develop and test theories or provide systematic descriptions of human, organizational, and institutional behavior through the use of computational methods and practices. On the most basic level, this can mean the use of standardized computational methods on well-structured datasets (e.g., applying an off-the-shelf dictionary to calculate how often specific words are used in hundreds of political speeches), or at more advanced levels the development or extensive modification of specific software solutions dedicated to solving analytically intensive problems (e.g., from developing dedicated software solutions for the automated collection and preparation of large unstructured datasets to writing code for performing simulations)."

[Theocharis and Jungherr, 2021], p. 4.

In this definition, the specific properties of new data sets take a backseat. Instead, the definition foregrounds theory-driven work with computational methods in the social sciences. At the same time, it recognizes the importance of descriptively oriented work. This is important not least because CSS opens up new types of behavior and phenomena that only arise as a result of digitization or which were previously beyond the grasp of social scientists. Accordingly, there must be room in CSS for first systematically recording and describing new behaviors or phenomena without forcing them hastily into the limits of well-known but possibly unsuited theories.

The definition also foregrounds an important point of tension in precisely differentiating CSS from other fields in the social sciences. Nearly all contemporary work in the social sciences relies on computational methods and digital or digitized data. This includes the storage and processing of digital data (such as digital text, image, or audio files), computationally assisted data analysis (such as regression analyses), or data collection through digital sensors (such as eye tracking or internet of things enabled devices). In this work, computation is often a necessary precondition. For example, while it is possible to run multiple regressions with pen and paper, the success of this method in the social sciences depends on the digital representation of the underlying data sets and computational resources available to process the data. In the most general reading of the provided definition the use of any computational method in data handling and analysis would qualify as computational social science. One could thus argue that nearly any form of contemporary social science would constitute computational social science. Obviously, this is not helpful in identifying constituting elements of the field and subsequent potentials and challenges.

In talking about CSS specifically, it might be helpful to focus more on studies and research projects in which computational methods and practices are not used as plug-and-play solutions but instead demand for varying degrees of customization with regard to data collection, preparation, analysis, or presentation. Again, this is best thought of as a distinction in degree. On one end of the scale, we find projects that require some coding with regard to the sequential calling of pre-existing or slightly modified functions or data management. On the other end of the scale, we find research projects that demand the development of dedicated software solutions, for example in automated and continuous data collection, preparation and structuring of large unstructured raw data, or the development of dedicated non-standardized analysis procedures. Projects at different ends of this scale share issues arising from their focus on social behavior, systems, or phenomena but they vary significantly with regard to their computational demands. Projects that use standardized computational methods might thus be basically indistinguishable from other areas in empirical social science research. On the other hand, projects at the other end of the scale are likely to face challenges indistinguishable from software development in computer science.

Any conceptualization of computational social science should thus not be tied to a specific set of methods, data sets, or research interests. Instead, the constituting element of CSS differentiating it from other approaches in the social sciences, is the degree to which research projects demand for the inclusion and development of computational methods and practices over the course of a project. At the same time CSS is a specific subfield in computer research in that it focuses on social systems and phenomena. Consequently, approaches and methods have to account for the specific conditions of this research area.

Computational social science occupies a bridging position between the social sciences, computer science, and related disciplines. This enables researchers to conduct interdisciplinary research into both new and already known social phenomena by combining social science theories and methods as well as concepts and methods from computer science. In this bridging function, CSS gives the social sciences access to advanced computational approaches and methods, while opening up subjects of study in the social sciences to computer science and related disciplines. In the dialogue between the disciplines, CSS contributes to the institutionalized transfer of knowledge and practices and helps at overcoming historically grown barriers between fields. If successful, computational social science does more than just transfer knowledge or methods. It combines theoretical and methodological approaches from related disciplines into viable concepts and research designs and applies them in order to establish scientific knowledge on social phenomena.