Abstract
Introduction
While an already well-established methodological framework, Grounded Theory (GT) faces a challenge such that its results are often perceived to have limited trustworthiness (Aldiabat and Le Navenec, 2018; Connelly, 2016; Glaser, 2019; Kolb, 2012; Sikolia et al., 2013; Timonen et al., 2018; Williams and Morrow, 2009). However, a methodology’s transparency and implied trustworthiness are of paramount importance (Aguinis and Solarino, 2019; Jacobs et al., 2019; Maher et al., 2018; Miguel et al., 2014; Moravcsik, 2014, 2020; Tuval-Mashiach, 2017). In light of this established weakness of GT, here I propose Relational Coding as a tool enhancing the methodology’s transparency and trustworthiness by making GT coding more accessible to the researcher and reader alike. While not a new approach or an amendment, it is a potentially malleable toolkit, rooted in Glaser’s approach to GT coding, which can better display the legitimacy of current dominant approaches by providing ‘an audit trail. . . as a means of holding up to scrutiny the methodological and theoretical decisions made throughout the research process’ (Bowen, 2009: 305).
In aiming for trustworthiness, my understanding draws on Connelly’s (2016) explanation, suggesting that a trustworthy study is one which generates trust in its data, interpretations thereof, and the interpretation process itself. Per Connelly (2016), I believe a useful theory-building methodology: produces research that generates comfortable confidence in its findings (demonstrating how they were arrived at); includes an easy-to-follow audit trail which renders the published results dependable; forces the researcher to document subjective decisions and gives the reader insight into these; and richly documents contexts so that the extent of generalisability claims are clear. Considering this, Relational Coding provides transparent foundations essential for theory-producing work like GT where the aim is not necessarily the accuracy of outcomes and their verifiability or replicability based on said accuracy. Rather, GT aspires to demonstrate convincing and explanatory concept-building which becomes increasingly abstract and ends with claims of said abstractions’ generalisability (Bryant, 2017; Glaser, 2002b; Maher et al., 2018).
Classical grounded theory is a theory-generating/locating methodology which relies on the disciplined use of constant comparative analysis to uncover concepts from data and locate generalisable theory within the relationships between these data-grounded concepts (Glaser and Holton, 2007). Half a century has passed since Barney Glaser and Anselm Strauss first presented
In the course of my own attempt at learning how to do GT coding, I found multiple inputs rooted in the methodology’s various strands from established and emerging voices in the field alike, including: Glaser and Holton (2005, 2007), Holton (2007, 2010), Corbin and Strauss (1990, 2008), Strauss and Corbin (1994), Charmaz (2008, 2014, 2017), Clarke (2019, 2003, 2007), Allan (2003), and Bowen (2009) among others. These various strands are characterised by varying degrees of clarity when it comes to how to code and how to demonstrate that one has done so reliably, leading many to believe that GT researchers suffer from ‘methodological incompetence or fragility’ (Bryant, 2002: 32; Glaser and Holton, 2005; Strauss and Corbin, 1994).
The question remains: what can Relational Coding add to GT as it is currently practiced? As a tool in early development it holds the potential for improvement and modification to widen the range of its applicability. Nonetheless, it makes two small but distinct contributions. First, it lays bare analytic functions which do not automatically occur regardless of whether one is coding manually or digitally. Second, a review of recent impactful GT publications indicates that Relational Coding is largely absent from current reporting and writeup practice, and even where scholars hint that they have taken codes’ relationships into consideration, there is no demonstration of how this happened. For the purposes of this review I examined articles on the Web of Science database containing ‘grounded theory’ in the title, restricted my search to articles published in the years 2018–2022 (1236 articles in total), and reviewed the top 10. I then searched through articles published in the last 5 years specifically demonstrating or discussing the outcomes of coding with ‘grounded theory’ and ‘cod*’ in the topic (2119 articles in total) and analysed the top 10. I read each article with an interest in its approach to coding specifically, rather than its contribution to grounded theory in general. I examined whether research is laying out blueprints for coding in a more transparent and trustworthy manner, and found that while there are some significant contributions in this regard, the relationship between codes is not sufficiently addressed, per the results captured in Tables A and B in the Appendix.
Positioned within the knowledge gaps outline here, this paper begins with a brief explanation of the importance of coding as my chosen focus within GT literature. I move on to explore GT’s original point of departure, the ensuing rift between its founders and the impact that had on coding approaches, and the more recent coding frameworks which have followed. Far from an exhaustive review, this section touches on a few of GT’s major strands which have influenced my own understanding insofar as their elements which are relevant to coding in particular. I then go on to briefly explain my own understanding of coding, rooted in Glaser’s work, which was moulded under pressure to compromise in some areas where my examiners and faculty required it, during the course of PhD and Master’s research. I conclude by drawing on examples from my own research explaining how to write up Relational Coding, with a particular background in the expectations levelled against researchers in academia. I demonstrate diagrammatically how Relational Coding works as a tool for carrying out and writing up the coding process with an emphasis on uncovering the underlying relationships which contribute to theory. It additionally complements existing coding software by proving that coding was supplemented by analysis, and that this analysis is traceable through records, which software alone cannot account for yet (Deterding and Waters, 2021; Maher et al., 2018). It is a Glaserian-inspired approach to coding which demands a demonstration of the researcher’s accountability, rendering the process and writeup of coding clearer to the GT reader and author alike, thereby contributing to the enhancement of GT in general.
Key contributions to GT coding
Importantly, accomplished ‘how-to-do-GT’ guides already exist, and while touching on some elements of the methodology related to coding I will avoid this task which many scholars have already ably accomplished (Birks and Mills, 2010; Birks et al., 2019; Chun Tie et al., 2019; Flick, 2018; Glaser, 1998, 2007; Sbaraini et al., 2011). Here, I explore the work of a handful of those scholars whose contributions have influenced my own understanding. I review these briefly to demonstrate two points. First, I seek to highlight why coding for relationships is relevant. Second, and building from this, I set out to demonstrate how relational coding functions as a ‘how to show you did GT’ guide in a manner that is complementary to the various established approaches which I sample here.
Glaser, Strauss, and the discovery of GT 1
In contrast to deduction-heavy approaches typical of academia at that time, Grounded theory was first crafted by Glaser and Strauss in an effort to criticise dominant modes of scientific research. Thus, Glaser and Strauss became active proponents of immersive inductivism as a tool for extracting theory from data. Seeking to depart from research as verification, they pivoted towards theorising as their focus instead and paved the way for questioning the assumed exclusive validity of quantitative research (Chun Tie et al., 2019; Glaser and Strauss, 1967; Timonen et al., 2018).
Grounded theory’s core process is abstractive conceptualisation, aiming to produce theories based on the relationships between the most important emerging concepts through repeated rounds of data collection and analysis (Glaser, 2016, 2019). However, despite the numerous detailed components which GT involves (summarised in Appendix B), and its creators’ desire to contribute towards frameworks which ensured rigour in qualitative research, ‘the methodological advice pertaining to this foundational variant of GT is relatively lightly documented’ (Timonen et al., 2018: 2). It is perhaps as a result of this lack of clarity that Glaser and Strauss later learned they differed pointedly on numerous points related to the underpinnings of GT (Howard-Payne, 2016). As such, GTs processes and their implications for codes are discussed separately under the following sections which explore Glaser and Strauss respectively.
The implication for coding as GT’s core process specifically is that while its importance is well-established (Glaser and Holton, 2007; Holton, 2007) its ‘how to’, in terms of carrying it out and writing it up clearly, appears to be less understood (Allen, 2014; Bowen, 2009; Glaser and Holton, 2005). Consequently, GT studies are often recorded in a manner which obscures the research process, suggesting a lack of transparency to readers, reviewers, and potentially even the researcher themselves (Dunne and Üstűndağ, 2020; Glaser, 2012; Yu and Smith, 2021).
Straus and Corbin
It is Strauss, having successfully partnered with Corbin following his split with Glaser, who provides us with early attempts to make coding more accessible in his seminal Qualitative Analysis for Social Scientists (Strauss, 1987). For him (1990) GT coding consists of three key elements: open, axial, and selective. These are not necessarily separate phases, but rather various qualities of coding which may be simultaneous and intertwined, or separate. Importantly, for Strauss codes are generated through immersion in the literature very early in research (Strauss and Corbin, 1998, 2008).
First, during open coding, the type of analysis carried out in open coding is purely generative, including questions such as: What is this concept? How does it manifest? What sets it apart from other similar concepts? Axial coding follows from open coding, drawing categories and subcategories into relationships with one another, and testing this against the data itself for verification. The researcher who codes axially must compare the conditions, context, strategies, and consequences relevant to each category and subcategory, to draw them into relation with one another. Without this close reflexive relationship between the data and analysis, the product of GT cannot be anything other than the suggestion of theory with some major gaps. Instead, if axial coding unfolds rigorously, all researcher bias, deductive thinking, and data-based hypotheses are constantly verified- even negative cases are drawn into the concepts to make them richer, and the outcome is powerful explanation (Corbin and Strauss, 1990). Finally, selective coding is the process whereby all the codes are brought together to form a core category.
Corbin and Strauss (2008) provide detailed guides on how to code using software, but fail to capture the iterative induction component inherent to this process. Thus, while mapping out a useful skeleton for GT studies their work lacks an explanation for demonstrating the complex network of connective tissue and muscle which gives the skeleton its form. In an effort to create transparent approaches to writing up codes and theories which are creative, rather than created, I present Relational Coding as one possible tool. Before moving on to demonstrate its qualities, it is useful to consider alternate approaches to coding which view codes as emergent, in addition to smaller contributions to coding from other scholars.
Glaser and Holton
Glaser’s approach to GT, which according to him is most true to form and thereby labelled classic (Glaser, 2014), is partially an elaboration and specification of the work he began to publish with Strauss, and partially built from passionate responses to other strands of GT which he problematises (Glaser, 2002a, 2019). A number of Glaser’s influential contributions have been made alongside Holton (a prolific contributor to GT in her own right) and so I discuss their work together here.
Coding is so important to GT that Glaser and Holton (2007) advocate for it to be a daily pursuit in early research. For Glaser (2019), the key motivation behind coding is the crucial step of moving beyond accurate description, towards theory with explanatory power. Unlike Strauss he advocates against early immersion in the literature as a manner to facilitate coding. At the same time, he does not demonstrate how one would prove they have followed the coding processes he advocates for.
For Glaser, coding has two key qualities/parts (open and selective) and two key types (substantive and theoretical). Open coding identifies core categories in the data which explain exactly what is happening, at a granular level, that is, its substantive empirical qualities. When the researcher is confident that they have located the study’s core variable, selective coding can begin, as that is the point at which the researcher will be able to discern which codes relate to one another in a manner which produces theory. Through selective coding theoretical codes begin to emerge. Their effectiveness and usefulness rests on their conceptual and abstract nature (Glaser and Holton, 2005). Theoretical coding foregrounds the question of relationships, particularly those between the substantive codes discovered earlier. These relationships, fully conceptualised during theoretical coding, are best understood as early hypotheses which come together to become the building blocks of theory (Glaser, 1978). Notably, Glaser and Holton emphasise that theoretical coding is perhaps understood most weakly, and is most often captured obscurely in a manner which doesn’t always demonstrate what the codes are
The entire process rests on
In terms of write-up, Glaser and Holton (2007) highlight the need for coding to be captured in a manner which demonstrates the connections between all the component concepts of theory. The emergence and introduction of an idea should be traceable throughout the various levels of theory (Glaser and Holton, 2007). However, a clear step-by-step example of how to achieve this is lacking. In this context the clear demonstration not only of these relationships but also of their grounding in the codes and the data is what Relational Coding focuses on, hoping to contribute to this gap.
Charmaz, Clarke, Allan, and Bowen
Without wanting to reduce the grounded theory academic discourse to the works of Glaser, Strauss, Holton, and Corbin, briefly touching on a few additional scholarly contributions to GT coding is useful without deeply exploring the work of each author. Kathy Charmaz, who has also worked with Linda Belgrave, Antony Bryant and Robert Thornberg (Bryant and Charmaz, 2007, 2019; Charmaz and Belgrave, 2019; Charmaz and Thornberg, 2021), emerged as a leading proponent of constructivist GT. She proposes coding in line with the logic of constructivism which frames the researcher as an active shaper of the data rather than a removed observer (which has controversially drawn the ire of Glaser (2002a) while being hailed by others as most useful (Birks et al., 2019)) (Charmaz, 1996, 2011, 2017).
For Charmaz and Thornberg (2021) early coding is very intense as it involves a line-by-line approach of slow, focused, and deliberate engagement with the data at a granular level, an essential step to confident coding later on. This line-by-line approach can stop when the researcher discerns how codes come together and which ones ‘feel’ the most important. Sorting the codes is key to deciding on their importance, because codes (per Charmaz and Thornberg, 2021) are not all alike. Those codes with the ability to explain more of the data than other codes can be prioritised as ‘important’ or ‘focused’ codes, whose explanatory power must be tested against larger and larger portions of the data. This process is called focused coding, and when followed properly it can speed up analysis by generating possible categories for further exploration early on. This directs future coding to focus on the questions and themes generated during focused coding. Based on Charmaz’s approach to coding Ylona Chun Tie, Melanie Birks, and Karen Francis (2019) have created useful graphical representations of the various phases of coding. However, these adaptations only serve to demonstrate to the researcher the relationship between various steps of coding, rather than showing the researcher how to demonstrate to the reader the relationship between the various
As opposed to the explanation-centric focus of coding per Glaser, Strauss, and those who have worked with them since the publication of
Another influential iteration of GT is found in Clarke’s situational analysis as grounded theory, an approach inspired by postmodernism (Clarke, 2003). Not to be viewed as a standalone take on GT, Clarke’s situational analysis is a supplementary tool to Strauss’s existing work (Clarke, 2003). Starting with open coding, Clarke advocates for an approach which begins word by word and then evolves to looking at the data segment by segment. For Clarke, these open codes are temporary labels whose individual properties are slowly developed and explained as the researcher delimits their scope and begins to decide which codes are important and which are not.
GT researchers, per Clarke, look to find simple yet pivotal social processes underlying the area of investigation, capturing these as gerunds which indicate action. In turn, each process is then understood and described in terms of its ‘particular and distinctive conditions, strategies, actions, and practices engaged in by human and nonhuman actors involved with/in the process and their consequences’ (Clarke, 2003: 558). Codes which are deemed important are related to one another by the researcher, and then combined to make categories which provide a theoretical perspective on the area under investigation. Eventually, this builds towards substantive theory (Clarke, 2003).
Working to enhance what she describes as GT’s existent postmodern potential, Clarke seeks to go beyond the end goal of following the method to produce grounded theory, rather seeking to engage in grounded theorising throughout the process. In order to analyse the data in a fresh way and open it up, Clarke proposes situational analysis via mapping as a key tool. However, per her own admission, this is a tool for processing traditionally coded or at least partially coded data. In the same breath she notes that GT needs a postmodern orientation so that it can acknowledge the complexity, differences, and necessary relationality of postmodern reality. Without exploring Clark’s mapping tools, which are certainly valuable additions to the GT discourse, they are not an approach to coding. However, her work sparks important questions about the methodology while leaving a gap where coding which appreciates the inherent relationality of the subject of analysis could make a significant contribution. While making great strides in enabling the researcher to understand the delimitations of coding, Clarke’s method does not indicate how to demonstrate to the reader that this delimitation is rooted in queues from the data itself. Here, I present the potentialities of Relational Coding.
Finally, a number of contributions have been made to GT through its application to information systems and computing. One useful attempt to clarify coding is rooted in the work of Allan (2003) who utilises Key Point Coding. For Allan (2003) coding became too cumbersome at the micro level, leading him to focus on coding on a key-point-by-key-point basis. This approach begins with identifying important incidents to code and assigning them an identifier which allows the researcher as well as the reader to track the origin of the code. Each identifier consists of two parts: first, a sequential identifier indicating when it emerged (first point = P1, second point = P2. . .etc.). Second, longitudinal identifiers are added to indicate the case study/site in which the point emerged (so key point 5 emerging in case study Z would be written as PZ5. Associated codes/explanations would then be written alongside each identifier, so that each code can be traced to its point of origin. Building on this first phase of identifiers and codes, codes are then compared to one another (a process which can be tabulated quite neatly using all of the identifiers) so as to locate commonalities and produce concepts, before building towards theory. Allan’s (2003) use of diagrams and tables provides useful insight into coding and Key Point Coding in particular. While it demonstrates a great awareness of constant comparative analysis and the process of drawing codes and concepts into relation with one another, it appears to stop short of indicating how to render these relationships transparent. This approach shows how codes are grounded in data, but not how relationships are grounded. It is those relationships which are a basis for theory, confirming the useful potential contribution of Relational Coding.
Also seeking to demystify coding, Bowen’s (2009) proposed audit trail technique explicitly lays out a tool to systematically trace research as it processes data, from early collection to final analyses. Bowen insightfully highlights a key hurdle for GT studies: replicability. His most powerful contribution is in demonstrating a tool which allows for replicability of process, without expecting replication of results (Bowen, 2009). In this manner, the steps of the researcher can be easily traced, as with Allan’s (2003) Key Point coding technique. Going beyond a mere transcript of the research, the idea of an audit trail is rooted in the need for GT to render the analysis process transparent and trackable. For Bowen, this can be demonstrated diagrammatically showing which data each code comes from, which open codes become which axial codes, and which axial codes in turn create a selective code (Bowen, 2009: 314). What this stops short of, however, is elaborating on
What next?
The preceding sections established the importance of research transparency and trustworthiness, the strengths and limitations of current approaches to GT and their implications for coding, the weakness of GT coding whose writeup and execution are vulnerable to limited transparency and trustworthiness, and a lack of existing GT papers devoted to addressing this. With regard to Strauss and Corbin’s work, I find myself agreeing with Charmaz and Thornberg’s (2021: 7) assessment of it as ‘rigid and prescriptive in sharp contrast to the open-ended, fluid approach’ of earlier publications. In reflecting on Glaser and Strauss’s (1967) early work, carried forward in some senses by Glaser (1992, 2014), Glaser and Holton (2004), I find the element of inductivism over-exaggerated, or rather under-specified (cite). Overall, the approaches reviewed here rely on delimitation as a strategic research tool, without explaining how exactly to demonstrate such delimiting decisions on the part of the researcher. Inspired by Bowen (2009) and Allan’s (2003) respective efforts to create an audit trail for GT, I share Relational Coding as GT demonstration toolkit designed to create such a trail for the researcher and reader alike by contributing to the gap in knowledge and practice where code relationships are concerned.
In presenting this toolkit I acknowledge that Relational Coding is not a one-size-fits-all fix for GT or methodological transparency in general. Per Corbin and Strauss GT is
Relational coding
Why relationships?
The label ‘Relational Coding’ may seem unnecessary for a theory-generating process given that theory can be described, simply, as the range of possible relationships between concepts (Strauss and Corbin, 1994). The problem, as established, is the questionable transparency and trustworthiness of many studies which set out to follow GT. I posit that this is because existing approaches do not adequately indicate how to demonstrate the use of relationships as foundational for coding and theory, much less explain why those relationships are plausible and explanatory based on the data they supposedly emerged from. While I believe that many of the existing approaches to coding
Inspired by Corbin and Strauss’s (1990) assertion that ‘if a grounded theory researcher provides the pertinent information, they enable readers to assess the adequacy of a complex coding procedure’ (p. 17), my approach to Relational Coding is rooted not only in a desire to facilitate rigorous, coherent coding, but to enable myself (and all who may choose to utilise this) to write up the often mystical coding process with as much useful detail as possible, without burdening the reader with a bulk of data which obscures the process of analysis. As such I set out to code in a manner which weaves in constant reminders to theorise into the process and have captured this within Relational Coding.
Per Appendix B’s Figure A, my personal take on GT is closely aligned with that of Glaser, though I also benefit from the contributions of his peers as mentioned previously. I use open, intermediate, and theoretical coding (alongside a late-stage literature review) to locate emergent theory. Rooted in Glaser and Strauss’s (1967) early work I find it most useful to code at the level of incident, rather than word by word or line by line. The following section elaborates on Relational Coding in full, before moving on to share examples from my own previous research which serve to demonstrate what it might look like.
What does Relational Coding look like?
At the outset it is important to specify that Relational Coding is applied across all of the, per Glaser, ‘classical’ phases of GT coding (Glaser, 2014). In accordance with Glaser’s instruction to keep coding open in the beginning, the beginning of Relational Coding are loose and free though structured by an internal logic (my personal preference is abductive logic (Reichertz, 2007, 2019)). For open coding this practically means identifying themes as regularly or irregularly throughout the text as makes sense and recording them all in the form of memos which contain explanations for each theme that allow you to return to the text where the code originated. I analyse all data, when I start coding, with my problem question in mind (one which changed in and of itself throughout the research process). When I choose incidents to code which are relevant to what I am very generally looking to learn about, I record my codes and memos in answer to the question: what about this interests me and why? (this question might change from one author to another, but clearly acknowledging it is key to providing methodological transparency). I choose to do this by inserting text into pdf versions of the documents I analyse, as comments in the margins, which were easily searchable, or as voice recordings and written notes stored in my cell phone during immersive observational fieldwork.
When I have coded in this manner for a few months (or as long as is needed to reach a point where coding is generating bulk rather than new substance) and coded for a few successive rounds (searching for data based on where the previous data suggested I should look for more answers) I pause the process. Here I review all the codes, following which I gather them into one document. Usually at this point I discover that I have hundreds of proto-codes (early versions of codes, or prototypes). While GT encourages free thinking it also requires avoiding bulk in the data (Barlett and Payne, 2004), and so at this point, keeping in mind the numerous questions that Glaser (1992), Corbin and Strauss (1990), and others suggest posing to the data, I find two to offer a particularly useful foundation for reducing code bulk: first, what is happening here (referring to process), and second, to what end/meaning (referring to produced meaning)? Far from restrictive or exhaustive, this short list of questions, for me, encompasses a much longer list including: What is the main idea being studied? How could I summarise it? What do all of the actions and interactions I studied lead back to? How can I explain outliers? (Corbin and Strauss, 1990). What is this data studying? What is the emergent category? What is happening? What is the main problem/concern facing the people I am studying? And how is this resolved/attempted to be resolved? (Glaser, 1992).
This set of questions lies at the heart of Relational Coding (though it may be modified to include different or additional questions as per any study’s needs). In no way is this intended to limit the scope of discovery and, consequently, theorising. However, as with every research design choice a researcher makes, these questions inevitably have a delimiting effect on any study and thus must be listed clearly and referred to transparently. In the case of my own previous work I have always listed the questions in full and opted to attach appendices summarising which questions dominated particular stages of coding to provide further insight into the subjective and iterative processes behind coding. Ultimately the reduction of the rich range of coding-guiding questions emerging from the works of Glaser and others is a choice which aims to make research and its writeup easier while avoiding limiting its depth. Asking ‘what is happening’ and ‘to what end/meaning’ offers a way to summarise, from my point of view, the full range of questions which have been drawn on. These other questions are not erased. Instead, reflections around those questions are captured in memos and recorded in the writeup whereby the definition of each code is provided. For example, questions such as ‘what is the problem here?’ and ‘how is it being solved/ignored?’ are implied corollaries of what is happening. At the same time, as mentioned in the paper, I envisage Relational Coding as a malleable framework which can respond organically to different priorities, interests, and capacities of different researchers. As such, lists of questions and the manner in which they are recorded will differ from researcher to researcher, but tying the coding to the questions asked is an essential step in ensuring transparency and trustworthiness.
In the case of my own work with Relational Coding, I pose these two questions (what is happening, and to what end/meaning?) to every single one of the hundreds of codes I locate during early coding. The result is often hundreds of lines of two codes each which allow me to explain, at every interesting moment in the data I analysed, what is going on and what is the impact generated. Most recently this significantly reduced the number of codes I was dealing with from hundreds to less than 40. At this point, I re-code the data so that in all of those (potentially) hundreds of incidents two of my new codes could be used as a label (one answering ‘what is happening’ the other answering ‘to what end’). I then continue coding until I reach data saturation (when the data persistently fails to yield new insights compared to the existing coded material (Glaser, 2012)). Often by this point there are thousands of lines of code, each consisting of two codes (usually stored in an excel table or any coding software you may prefer).
Now I begin to consider the thousands of relationships represented in the thousands of lines of codes. To record this, I create tables and bar graphs demonstrating the number of times each code appeared; its emergence relative to other codes (which codes coincide, and how frequently or infrequently?); and the correlation of the various codes in connection with their coincidence. Figure 1, Tables 1, and 2 which follow demonstrate this at a very basic level.

How to display Open Meaning Codes displayed by frequency of occurrence in the data.
How to record the origin of codes for comparison purposes.
How to display codes which occur together frequently for comparison purposes.
How to display Open Meaning Codes displayed by frequency of occurrence in the data. I would begin to shed light on the data, the codes, and the relationships between them with a chart tallying the total emergences of each code, one for process codes and one for meaning codes. For example, the following Figure 1 displays an example of how meaning codes might be displayed as derived from open coding. In the real writeup rather than the labels ‘code 1’ etc., the name of the code would appear.
From a simple and clear table such as Table 1 I might draft a separate table for each code, isolating that code’s emergences from the data, and identifying which codes it occurred alongside, for both process and meaning codes. These representations of data would then accompany detailed descriptions of each code.
Based on representations such as these, the foundation is laid for rich description and definition of each code rooted in my interpretation of its relationship with other codes, both strong and weak. Having demonstrated these relationships in terms of frequency of emergence the process of defining the codes then demonstrates my subjective interpretation of the data. Here I allow the definitions to take shape around the full gamut of questions posed to the data. In this manner theory is not limited and can achieve richness in a transparent manner that allows for complexity while reducing confusion. It must be stated that this is not an argument on the role of quantitative data in qualitative research. Representing the data with relation to frequency of occurrence should by no means be framed as attributing particular importance to themes. Instead, it is part of an exercise in transparency that I have come to carry out as directed by regular requests for such from reviewers.
In representing the data thusly, two core concepts must be emphasised. First, coding is largely subjective and recording any part of the coding process is a record of biassed decisions that enable any reader or reviewer to follow the author’s logic, rather than to test whether it is accurate or not. Second, no false importance should be associated with frequency. For example, a code occurring infrequently, for example, might be even more significant than one occurring very frequently. Additionally, the frequency of occurrence is what allows me to move on to explain co-occurrence, that is a relationship, as I perceive it. This is what demonstrates the subjective choices made in trying to relate codes to one another in a manner that explains the larger questions at hand. Only through representing codes by frequency of emergence in addition to writing up detailed definitions can a fuller range of the subjective delimiting decisions made by the researcher be laid bare for the reader.
Having teased out these relationships, what remains is to begin integrating the potentially thousands of lines of coding and numerous codes into fewer and fewer key concepts. This point represents the end of open coding for me, and the beginning of intermediate coding. Here, I begin by summarising the open codes under a reduced number of labels. For example, recently I combined 14 open codes under the umbrella of three interrelated intermediate proto-codes (prototypes of what the final intermediate codes will be, as I am not finished with the process of narrowing down at this point). The process of beginning to code to unite in this manner relies on simple diagrams once again. First, tabulating the process per Table 3:
How to display the beginning of intermediate coding.
These codes, however, are not merely being grouped for the sake of reducing their number. Rather, they are being drawn together and united under new conceptual explanations, capable of generating explanations in and of themselves. For this reason, it is essentially to follow up the simplistic summary portrayed in the tables, with a more powerful conceptual explanation of each intermediate code, showing how the open codes which led to its creation interrelate. Multiple data visualisations would be useful here, though I tend towards colour-coded doughnut charts for their simplicity. The following Figure 2 demonstrates how I display the intermediate proto-codes which I might chose to group together in perceived interrelationships.

How to display the component elements of early intermediate codes.
Diagrams such as these doughnut charts allow the reader to go one step beyond understanding which open codes are combined to create early intermediate codes. This demonstrates

How to display the component elements of early relationships between intermediate codes.
Here I use arrows to indicate which meaning codes the process codes are related to most strongly, and vice versa, to justify the explanations that I build from those relationships, and the manner in which I summarise closely related codes into more powerful concepts. Using arrows to demonstrate the relationships between codes which I unite under early concept labels, I seek to lay the foundation for further theory-building that can only happen by continuously narrowing down various concepts into finalised intermediate codes, prior to theoretical coding. In this manner I have previously reduced six early intermediate codes (3 intermediate meaning codes and 3 intermediate process codes) down to three overall intermediate codes. In doing so their richness and explanatory power increases.
Adding detailed explanations to each set of interrelated codes, I assign a final intermediate code to each combination of intermediate meaning and process proto-codes. This label is derived from the in-depth and transparent explanations I have already provided for the proto-codes, the intermediate codes from which they emerge, and the relationships captured within them. For me, this marks the point at which I am ready to begin coding theoretically.
Finally, in theoretical coding, I narrow down the intermediate codes. Having demonstrated in great detail the emergence of the codes, the relationships between them, and the theoretical building blocks emerging therefrom, I use theoretical coding to map out the narrative generated by these powerful theoretical emergences. Here I seek to locate one overall code with explanatory power and theoretical possibilities, and to take the reader along with me, hand-in-hand, so as to demonstrate my own subjective decision-making process as I pursue one overall theoretical proposition. In this paper I do not explain the narrative work of theoretical coding in great detail as I am not concerned with creating a how-to-code guide, but rather how to code transparently. For me, theoretical coding elevates all of the ideas from previous coding rounds to increasingly generalisable and applicable ideas with enhanced explanatory prowess, most usefully written up as a narrative which refers back to the data from which it emerged. If open and intermediate coding have been carried out as transparently as possible, with every delimiting decision and every subjective choice on code labels, relationships, and the meanings derived therefrom explained in a detailed fashion, then theoretical coding and its results become all the more trustworthy. The entire process is summed up diagrammatically in Appendix B’s Figure B.
A practical demonstration
To better introduce Relational Coding I find it helpful to draw from previous studies where I adopted this technique. Without diluting this paper’s focus by explaining another entirely, the research in question brought together grey literature analysis and the writeup of immersive observational fieldwork in public parks, exploring inclusion and cohesion in Johannesburg, South Africa. The question which guided the study was: how does the South African government’s normative policy vision for post-apartheid public space interrelate public space as it is lived and perceived in Johannesburg? I was specifically interested in four of Johannesburg’s public parks representing different neighbourhoods in terms of linguistic, racial, ethnic, and income demographics. The parks themselves were of different sizes, ranging from some being recently entirely rebuilt, and others in various states of decay.
Here I began with 2192 lines of code, each containing two codes (one representing process, and the other meaning), made up of 39 individual open codes overall, with each code’s frequency of occurrence mapped out per Figures 4–6. Figure 4 specifically provides a screenshot of manual coding and memoing, clarifying how codes are recorded and demonstrating to the reader that a dataset can be found wherein the subjective decisions made by the researcher are accessible. Figures 5 and 6 respectively demonstrate how to potentially display sets of codes which emerged in associated with posing particular questions to the data.

A screenshot taken from the master coding document on Microsoft Excel showing codes (sources blurred for anonymity) (anonymised).

Example of process codes displayed by frequency of occurrence in the data (anonymised).

Example of meaning codes displayed by frequency of occurrence in the data (anonymised).
The full details of these codes and their occurrence alongside one another was also made available in tabular form, with an example in Table 4 which follows (adding a level of complexity to the simple frequency of occurrence of each code on its own as demonstrated already in Figures 5 and 6)
Example of mapping out an open process code with the codes occurring alongside it (anonymised).
By way of analysis, I identified each code in rich detail, drawing on where and how it emerged from the data, as well as its relationship to other codes which occurred frequently alongside it (per Table 4 above), I was able to narrow down the 39 codes (14 process codes and 25 meaning codes) to six intermediate codes (three process codes and three meaning codes). This was simply done through asking, what are the key processes occurring across the 14 open process codes? The result was three intermediate codes which arose from defining and outlining every single open code (with further explanation provided in the figures which follow from Table 5):
Example of open codes and the intermediate codes which emerged from them (anonymised).
I then focused on each individual intermediate code showing the open codes which combined to reveal it and reflecting on the influence each open code had in relation to the others which together made this singular intermediate code. This process necessarily included textual descriptions alongside diagrams in order to foreground the relationships between the codes in the diagram (see Figure 7 which follows). For example, Figure 7 which follows demonstrates how an intermediate code, here ‘envisioning’, is comprised of six open codes, which all emerged from the data in various frequencies and are all significant in different ways. This would follow, of course, from the sections defining each code and explaining how it answers the questions posed to the data.

Example of an intermediate code containing the open codes from whose combination (i.e. the relationship between codes) it emerged (anonymised).
A figure such as this would be accompanied by explanatory text, as per the following box:
Finally, in preparation for theoretical coding, relationships between intermediate codes emerged (based on the relationships between the open codes which made up the intermediate codes) which allowed me to combine the six early intermediate code into three early theoretical codes, resulting in one final overall code. Figure 8 which follows is an indication thereof. As demonstrated in the figure, colour-coding provides an easy manner in which to highlight the perceived relationships between sets of codes, further demystifying the subjectivity of grounded theory work. Of course, this is only possible if rich explanations have accompanied the mapping of each intermediate code as demonstrated with Figure 7 and in the box following Figure 8:

Example of an intermediate code containing the open codes from whose combination (i.e. the relationship between codes) it emerged (anonymised).
In a standard writeup a diagram such as Figure 8 might be accompanied by 3–5 pages of analysis and explanation, however for the purposes of the scope of this section within a larger paper with a different focus, a short excerpt is in the following box:
The intermediate codes ‘envisioning’ and ‘institutionalisation’ were, in this case, part of six early intermediate codes (or codes that emerged early in the intermediate coding process) that were narrowed down to three final intermediate codes. Per Figure 8, each final intermediate code emerged from the analytic pairing of a process intermediate code with a meaning intermediate code. In the case of ‘envisioning’ and ‘institutionalisation’, for example, the emergent intermediate code which I found best captured what I had mapped out, including both relationships and the meanings behind them, was the idea of ‘a space > public dichotomy’. In other words, the idea that the institutionalise vision generated is one which prioitises considerations of space above considerations of the public in public space.
Building on the relationships visually mapped out in this way, the path is paved for engaging in rich descriptive and explanatory code discussions as one coherent theory which cohesively bring all of the codes together emerges. In this particular study I was able to discover that public space policy lacks considerations of the public as their agency is hidden and assumed to be superseded by the ‘space’ that is public space (anonymised). Using this approach, the combinations of relationships between codes can be laid clear, so that the emergence of theory is not shrouded in mystery, and the choices of the researcher underpinning each step of analysis are laid bare. The following section builds on this high level summary by instructively mapping out what Relational Coding entails.
Conclusion
Aiming to enhance the transparency and trustworthiness of GT studies, I put forward Relational Coding. Acknowledging GT’s various iterations and the numerous interpretations of each approach, Relational Coding constitutes a potentially modifiable toolkit, clarifying how and when to record key coding decisions. Naturally, the decisions themselves might be easily modifiable depending on the questions the researcher poses to the data. This tool constitutes a novel approach to recording the process of discovering theory through GT, not only building on numerous existing approaches ranging from manual to digital, rooted in various underlying epistemologies and ontologies, but addresses a gap in the literature. While GT is constantly evolving as scholars continue to shed light on its inner workings by publishing their approaches, demonstrating how the relationships between codes can be captured and made visible beyond the mind of the researcher has yet to be established. Here, Relational Coding provides a useful and inherently modifiable tool for doing exactly that.
Beyond a simple how-to-code guide for GT, Relational Coding aims to demonstrate how to code so that the theory produced by this methodology is trustworthy due to the replicability of the manner in which the results are arrived at, rather than the replicability of the results themselves. While it does not aim to function as a one-size-fits all fix for GT’s weaknesses, if applied correctly it will assist not only the reader but also the researcher with regard to a clear and coherent idea of theoretical concept building. In this manner open, intermediate, and theoretical coding can be recording in a manner reflective not only of their outcomes, but their process too. Subsequently, outcomes can be regarded as increasingly reliable and trustworthy due to the transparency of the process behind them. This should allow researchers and their audience alike to trace the emergence of theory from data, to the codes grounded in that data, to the relationships between those codes as they begin from open, narrowed down to intermediate based on relationships between open codes, and finally narrowed down further to conceptually rich theory based on relationships between intermediate codes and their component open codes. However, two key points for improvement remain. First, demonstrating how to modify Relational Coding for methodologically sound application to other strands of GT remains necessary. Second, as noted, differing strands of GT adopt diverging stances on literature reviews. Future iterations of the Relational Coding tool must incorporate clear instruction and demonstration in order to facilitate clarity around the influence of literature on coding. Third, while the Glaserian-inspired classical brand of GT which I adhere to calls for late-stage literature reviews it does not call for an ‘empty head’ (Richardson and Kramer, 2006). No approach to and subsequent account of coding can hope to be complete and comprehensible without demonstrating the positionality of the researcher, certainly not Relational Coding at the very least. Per Subramani (2019), whose work on reflexivity in theory building research could usefully be brought to weigh on Relational Coding, the researcher’s paradigmatic underpinnings are of great importance. Building transparency around positionality and personal subjectivities into this tool for transparency in more generalised coding remains a key future step for enhancing Relational Coding’s usefulness and applicability.
