The structure of an academic paper
Title; Authors; Abstract; Intro; Theory; Data and Methods; Results; Discussion; Conclusion.
Now that we are all set up on overleaf, we can really think about the structure of an academic paper. The way I think about it is a nested parallel structure. Broadly speaking, there are nine parts to the academic paper. Title; Authors; Abstract; Intro; Theory; Data and Methods; Results; Discussion; Conclusion.
First, the title. The title should convey what the paper is about. Often many authors try to be clever with their titles: they use fancy words, quotes, or something else to make others think this person is clever. But in the Internet age, the title’s sole purpose is to distinctively and succinctly communicate to the search engine gods what the paper is about so that people can find it and, once they see the title, want to read it further. You must give people clear information about what your paper will help them do; also, just like an article in an online newspaper, the title peak someone’s interest in the paper. That doesn’t mean that the title has to be clickbait, but instead encourages the user to look deeper and see what the paper might be about. This requires knowing your audience for the paper. What keywords do your audience members care about? Let me give you an example of a title gone awry.
I have a paper in the American Sociological Review. It’s one of the best papers I’ve written. The paper tries to understand the impact of peers on student learning and their long-term academic outcomes. The context is India. And the data leverages the randomization of roommates at Indian University to get exogenous variation in peer quality (a la Sacerdote, 2001). The coolest part of the paper is that I collected data on the social networks of these students for four years and could distinguish between the effects of friends, study partners, and just plain old peers. I find evidence for causal peer effects, but these are driven mainly by people you actively study with, not your friends. I also show that if you get unlucky with bad peers in your first year–no problem–you drop them in the future — and your low-performance effect driven by your peers vanishes over time. You return to your expected performance. However, if you get lucky and make good friends. You keep studying with them in the future, and your high performance persists. Simple and straightforward. But here is the title:
“The Mechanics of Social Capital and Academic Performance at an Indian College”
What the heck is “the mechanics of social capital?” Are the results only relevant to people who care about “Indian colleges”? The title doesn’t explain why (a) you should care about the paper and (b) what you will learn. I might write a different title for this paper if I were writing it again today. Perhaps, “The long-term effects of peers, friends, and study partners on academic performance.” The answer isn’t easy, and it may be helpful to think about how to write this title effectively.
You can be very data-driven about this as well. Here is a histogram of titles from the Strategic Management Journal and the average citations of articles that use these keywords in their titles. While this graph confounds both the information value of the title and the demand for this type of research, it may be a valuable exercise to think through what words convey what you want them to about your paper.
Motivation: Next is your abstract. The abstract is a microcosm of your entire paper. This may be a practical point to discuss how you think about the structure of your paper. The sense I like to think about a paper as an hourglass. You begin the paper quite broadly. This is the motivation behind your paper. What’s the big question? Why should anyone care?
Problem: The next part–the problem statement—is narrower. Tell us what problem your paper solves. We’ll talk more about what different types of puzzles exist; I’m ambivalent about the word puzzle because it can devolve into intellectual games rather than fundamental research questions. (This is a nuance that I’ve only come to appreciate after years of research and potentially reading thousands of research papers.) What problem is your research solving within the broader motivation you pose?
Your solution: Once you’ve described the problem, tell us your solution: What are you doing in this paper? What new insight, tools, and data combination of ideas do you bring to the table that will help address this challenge.
Your findings: tell us what you find. How did your new idea, your solution, work out?
Implications: Tell us what this means for our understanding of the problem and how we should now think about the world differently because of what we know through your research.
That’s it. The abstract should have, on average, five sentences. Each of the sentences conveys a tight point. Some points are more complex, and others are less, but you should aim for brevity and conciseness.
Okay, this is your abstract.
Once you have the abstract down, your next step is to write down the structure of the paper. What is nice is that there is some parallel structure to how one would write a paper.
Let’s talk about the overall structure of the paper.
First is the introduction. The introduction does what the abstract does but in more detail. Consider the introduction as having five paragraphs. These paragraphs correspond broadly to the five sentences that make up the abstract. In the first paragraph, you lay out the motivation. The hook. Why should anyone care about your paper?
The second paragraph lays out the problem. Why do most startups fail? Why do many low-income entrepreneurs choose underperforming businesses? Describe the problem in some detail here.
The third paragraph describes your solution. You build a theory about X and Y, propose the Z method to resolve this uncertainty, and bring this approach. This is your solution—what you bring to the table with your paper.
The fourth paragraph tells the reader what you find. You should also describe concrete findings, such as an X percent effect on Z. You should also provide scope conditions; these effects are most substantial in the subpopulation versus this other subpopulation. Maybe describe some long-term effects.
Finally, the fifth paragraph explains the implications of your work, both for the problem you are solving and the literature at large.
As you can see, this is an hourglass. You start off broad: a general motivation that interests people in your paper. Followed by a concrete problem or puzzle you are solving, then down to the specifics of your solution and your findings. And then you go broad again. What do we learn from your paper? Who learns?
Now that your introduction is complete. The other parts of the paper also follow this structure.
The sections of your paper are the introduction, theoretical framework/literature review, methods and data, results, and conclusion and discussion.
The introduction starts out broadly. Indeed it’s the broadest part of your paper, basically summarizing the whole paper — if you have someone read the introduction, the reader should broadly know what your paper is about and what you find. Sometimes this person will want to get into details and jump into your theory and results sections.
After the introduction, you have the theoretical framework. This is the most challenging part of the paper to write for strategy and organization theory audiences. There’s just so much variation in taste here. What someone calls a theory, others may not. I’ve had papers rejected purely based on the theory section of the paper. Sometimes people believed the findings but thought the theory section was not “theoretical” enough. I could go to an entire rant about how the field has gone awry. This theory section is not tremendously valuable for most papers. In fact, you see the most significant variations here across disciplines and authors. Some authors have excessively flowery language and try to make simple ideas complex. Other authors write models using mathematics and make predictions with those models. Yet, others have dozens of hypotheses and sub-hypotheses that they then say they will test with the data. Sometimes people just have a literature review. I’ll tell you what I found works for me — may not work for you — but I found that this is the best way for me to think about the theory section.
What is the theory section trying to do? It is trying to help you; the author lays out the logic that drives your empirical analysis. Why do you run the empirical test, and what expectations do you have for what you find in the data? This is what the theory section attempts to do.
Every paper is trying to convince you of some sort of fact. This fact may be like: getting advice from peers improves startup performance, or AB testing improves startup performance. The theory section tries to convince you that you should expect these facts based on the logic provided by the author. And in this way, the theory section is letting up a set of arguments that will lead you to that expectation, also considering potential counterarguments that may make you expect something else. That is what the theory section does. It is an expectation-setting section. A literature review does this. A theoretical model written in mathematics does this. As well as formal logic. What are the facts you are trying to convince people of? What logic do you use to convince them of this, and what supporting evidence at a conceptual level are you using to make this argument.
Classical rhetorical structure:
Exordium – The introduction, opening, or hook.
Narratio – The context or background of the topic.
Proposito and Partitio – The claim/stance and the argument.
Confirmatio and/or Refutatio – positive proofs and negative proofs of support.
Peroratio – The conclusion and call to action.
An excellent way to think about the theory section is in the format of the classical rhetorical structure. As you will notice above consists of five parts. These are the parts that also make up the abstract, the introduction, and the theory section.
Like in the introduction, you begin the theory section with setup—a null model of the world that most people should believe. Start by stating the importance of the problem and providing a more profound introduction to the topic by citing relevant literature and what it said. Next, provide details about your specific topic — the thing you are studying — and where the gap lies in the literature.
Next, structure your theory section as a set of claims that you are trying to get people to believe. These claims are nonobvious in a sense. A nonobvious claim is not necessarily outlandish but might not be something people have thought about deeply. Suppose one were to think about this claim solely from basic logical facts or truths we already know. In that case, one could come to a different conclusion. For instance, in the peer effects literature, it is not evident that peers will increase students’ academic performance.
There are, however, other reasons that this is likely to be true.
After making a claim, your job and the theory section is to provide a set of arguments where you try to convince people that your claim is valid. The way to best do this is by getting them to believe in more minor claims that have either been shown to be confirmed empirically, are apparent, and then linked to the truth of your claim. For instance, the claim that AB testing leads to performance was a contentious claim made by us, and the reviewers needed convincing.
We had to break down the problem: why did we believe that AB testing led to improved performance for startups, and what were the arguments for this?
If you look at the paper, and see that we made several arguments:
First, we began by setting up the problem. A key challenge faced by startups is uncertainty about the consequences of their decisions: what products to launch, what markets, and how should these products look and feel?
We then set up the idea that experimentation — the big picture concept — is what the authors believe leads to better performance for startups.
We then go into the fact that A/B testing is a form of experimentation enabled by digitization.
Why should A/B testing lead to better performance?
Everyone would agree that entrepreneurs need to test ideas. However, testing ideas is usually costly, and most people would use cut reactions to test ideas, thereby reducing the cost of testing ideas. People would test more ideas, and as a result, decisions would be made better.
Furthermore, it would also help you quantify the impact of bad ideas so that you do not implement them.
Finally, we connected to the idea that almost everybody agrees on, if you test many ideas and learn from them, you are engaging in organizational learning. And research has long shown that organizations that are better at learning perform better.
Here are a few reasons to believe that AB testing leads to better performance.
Consider setting up compelling counterarguments to your claim. A compelling set of counterarguments is essential for any claim in the paper because it increases the stakes. Suppose it is compelling that the opposite of your claim is true or might be true. In that case, your empirical test will likely matter much more in providing evidence that may shift people’s beliefs toward your claim versus the counterclaim. If, for instance, everybody already believes your claim, then there’s no added utility of your paper and changing people’s minds. You are here to change people’s beliefs.
As a result, you should spend some time on the counterarguments.
Here in the AB testing paper, we make a few arguments that may lead us to believe that testing might not increase performance. That is, we are setting up a counterfactual.
If firms are not experimenting, what else might they be doing? They may be relying on their gut instincts. Are gut instincts helpful? Entrepreneurs have experience and may have much tacit knowledge, which can lead to better decisions. But we know from prior work that entrepreneurs are overconfident and may think more highly of their own ideas than they should.
Entrepreneurs may not be running randomized controlled trials but are running uncontrolled experiments. For instance, there is extensive literature on tweaking and learning by doing. Maybe these are sufficient approaches, and running formal experiments may not add much. This is a compelling argument against our claim that testing improves performance and may mute its impact.
Now that you have made this claim, scratch that and support it with some arguments for and against this claim being valid. You must do both as you want to have a compelling null.
The primary claim may be sufficient for your paper, but you should consider two other kinds of claims that would be useful to add to your paper. First, it is usually the case that the primary claim is not unconditionally valid. That is, A/B testing only sometimes leads to improved performance. What are the conditions under which you believe your claim is true? Is it under situations where the firm also has an organizational structure capable of absorbing the information from these many randomized controlled trials and using them to drive decision-making? Is it smaller or larger firms that have different organizational structures?
A second set of claims extends the original claim: does AB testing increase webpage views? Could it also affect the number of new products startups release? Might some startups shut down when they learn that they cannot increase their performance with the idea they are pursuing?
A tool I use to write the theory section is a simple outline: It follows the structure I will describe in much more detail in the next section.
Now that you have written your theory section, it is time to discuss your methods and data.
The methods and data section is fundamentally descriptive. Your job here is to be as transparent as possible about how you will go and test your claims with the most convincing data you have.
This section may vary depending on what type of data you use (observational or empirical) but usually consists of the following three parts.
The first part is a description of your data or experiment. Here provide as much detail as possible about the underlying data you will use to estimate the models that will provide evidence for or against your claim. How did you build this data set? What kind of information does this data include? What is unique about this data that allows you to construct a convincing empirical test?
The second part is a description of your variables. Here, follow standard procedures by describing how you construct your dependent and independent variables. I usually write each variable as a separate paragraph, with the variable name as my first sentence followed by a colon. Sometimes, it is helpful to provide descriptive statistics about your variables in the write-up. This gives the readers a sense of what your setting looks like empirically. This is also when you want to provide a data summary table.
Another type of variable is the control variable, which may help you reduce the likelihood that your independent variable is biased upward.
Finally, you should have a section on your empirical strategy. The section describes the model you are estimating, how you fit this model, and what you are looking for in terms of the significance, sign, and magnitude of the coefficients on the right-hand side of that model. Also, discuss how this specification allows you to deal with inference problems. That is, whether or not you can make a causal claim about the relationship between an independent variable and the dependent variable.
Now you have laid out your data, the variables you create with this data, and the models you will run to test your claims. The following section is your results section. The section is often relatively short. Much of the work in setting up the paper is already done. Now: just show me your results.
A good result section should follow a format familiar to the reader up to the point you have led them. They should be able to read your results section and map it to the claims you make in your theoretical framework section.
In the AB testing paper, the first claim we make is that AB testing increases performance. Hence, the first part of the results section provides evidence using the data in the manner specified in the methods and data section, testing that claim. This test usually corresponds to the first table in the results section. In addition to the primary test, many authors (including myself) include additional robustness tests (e.g., adding fixed effects, more controls, etc.) to provide further evidence for the claim.
The other subsections in the results section test the remaining claims. They are usually structured similarly. A sub-claim is stated, paired with a table, and a narrative describing the evidence for or against the claim based on the data analysis.
A final section many papers may include in the results section is the “Robustness tests” section. This section may either come as a result of pre-empting criticism about your methods (so you try alternative specifications) or responding to criticisms of your methods that arise in the review process. The structure of this section is similar to the above. Still, the empirical tests are not tests of claims but rather counterclaims. For instance, A/B testing leads only to incremental changes, but the effects are driven at the tails — and we see this in the code changes.
The final section of your paper is your Discussion and Conclusion section.
In this section, authors usually have five paragraphs. First, you restate the motivation, research question, and “what you did.” Second, you describe what you found. Third, provide scope conditions — where is your effect the strongest/weakest. Fourth, what literature do you contribute to and how? Fifth, your limitations. What still needs to be added to your analysis? Finally, you should end on a high note. What could future studies reveal? Convey excitement about the future.