Chapter 2 Quantitative Ethnography

Quantitative ethnography (QE) is a nascent field aiming to establish a unified, quantitative – qualitative research methodology. This endeavor entails the formulation of novel concepts, negotiation of terminology, and developing new techniques that serve a wide variety of research initiatives in multiple disciplines. QE research involves or is conducted on Discourse data; this may include data in the form of spoken or written speech (interview transcripts, log files, social media data, etc.), annotations on observations (field notes, structured observation, etc.), and visual data (video recordings, photovoice, etc.). Other data may be collected as well, serving as “metadata” or characteristics of the data providers, data collection, or the data itself (e.g., timestamps, locations, sociodemographic data). Metadata can be used to group data providers and their discourse, enabling a wide variety of analyses. In order to represent various types of data in a single, unified dataset in a hu-man and machine-readable format, qualitative data needs to be quantified. This is typically achieved through discourse segmentation and coding. This unified dataset can then be employed for performing various analyses and modelling complex interactions. Zörgő et al. (2022)

Several tools and techniques have been developed under the auspices of QE, aiming to facilitate a part of the process described above or to enable various analyses using the generated dataset. For example, nCoder is a platform that can be utilized to develop, refine, validate, and implement automated coding schemes. Epistemic Network Analysis (ENA) has been the flagship tool within QE, enabling researchers to explore and model salient patterns in their data with network graphs. ENA depicts the structure of connections among codes in discourse by calculating the co-occurrence of each unique pair of codes within a designated segment of data, and aggregates this information in a cumulative adjacency matrix. ENA represents this matrix as a vector in high-dimensional space, which is then normalized to quantify relative frequencies of co-occurrence independent of discourse length or simple volume of talk. In this process, network connection strengths are transformed to fall between zero and one, and ENA then projects the networks as points into a low-dimensional space using rotations, such as singular value decomposition (SVD), which maximizes the variance explained. This analytical tool hence provides two, coordinated representations of the unified dataset: 1) network graphs, where the nodes in the model correspond to the codes in the discourse and the edges represent the relative strength of connection among codes, and 2) ENA scores, or the position of the plotted points on each dimension of the plotted points for units of each network in the low-dimensional space. Network models are thus constructed based on researcher decisions made when creating the qualitative data table (e.g., operationalization of segmentation and coding) and decisions regarding network parameterization. Zörgő et al. (2022)

Such unified methods and data modelling tools enable the researcher to ask both quantitative and qualitative questions about their data, be able to represent both aspects of data in a single dataset, and develop quantitative models that inform qualitative understanding and vice-versa.


Zörgő, S., Peters, G., Porter, C., Moraes, M., Donegan, S., & Eagan, B. (2022). Methodology in the Mirror: A Living, Systematic Review of Works in Quantitative Ethnography. In Advances in Quantitative Ethnography. (Vol. 1522, pp. 144–159). Springer, Nature.