Data analysis for strategy research

Now that we have our research system broadly set up, we can take a step back and focus on understanding our goals with data analysis for strategy research. In the simplest case, we are trying to estimate a quantity or set of quantities that reveal something new about the world.

The most common quantity we try to estimate is the magnitude of an “effect” that we believe is important. For instance, someone might be interested in estimating the effect of a specific resource on firm performance, such as good advice from a peer, or the impact of an organizational feature on the ability to absorb external knowledge and subsequently improve firm performance. Other effects researchers might be interested in include the effect of social network ties or the background characteristics of a CEO.

Another quantity we might want to estimate as researchers is the likelihood of some event or phenomenon occurring in the world. For example, in a research paper I did with my colleagues Rembrandt Koning and Ines Black, we focused on understanding the prevalence of a specific type of hiring mode: outbound recruiting, where firms actively recruit workers rather than waiting for them to apply. This is a less common type of data analysis, and to make the paper interesting, we often supplement this estimation task with more traditional “effect analysis.”

It is useful to pause here and discuss what “effect analysis” is and what it is not. We have all heard the refrain “correlation is not causation,” which is absolutely correct. However, one of the main tasks of social science (perhaps even science in general) is to understand causal relationships or cause-and-effect relationships in the world. Most research focuses on understanding this cause-and-effect relationship at a conceptual level. We make claims that are fundamentally causal; otherwise, they would not be interesting claims (unless we are describing a pattern, but even then, readers demand an explanation for why this pattern exists, resulting in us telling causal stories again).

The other side of this is providing evidence and support for your causal claim. Here, there is no “silver bullet” that allows you to make broad-based causal claims about the existence of an effect. Empirical analysis, whether derived from a randomized controlled trial or from observational studies, are tools that help you make more or less convincing claims of causality either through the design of the data (e.g., when you have a randomized controlled trial) or through a series of empirical exercises that account for confounding factors that may confound your effect of interest with other phenomena or processes that may be masquerading as your “effect.”

That said, an effect is most interesting if there is a strong “null” theory suggesting that we should not expect this effect. This null model is the conventional wisdom you are trying to overturn with your new claims in your paper. Your empirical evidence is most convincing if you can make strong causal claims (more on this later) and your result pushes against a strong null.

In addition to understanding this “effect,” we are also interested in determining the conditions under which it is strongest or weakest.

The tables you present in your manuscript aim to do three things:

A) Convince people that your claim about some “causal effect” in the world is true.
B) Explore the various implications of this claim.
C) Identify the scope conditions under which your claim holds, i.e., where and when your effect is the largest.

As such, these three goals can help guide you in creating the three primary tables that make up your manuscript. The first table aims to convince people that your primary claim (often corresponding to your first hypothesis) is true and interesting (i.e., supported by a strong null). The second table elaborates on the implications of your claim. For example, if you find that receiving advice from a peer mentor improves performance, it might also imply that internal management practices should be altered accordingly, as individuals would be more likely to seek advice from that person again if it proves beneficial. The third and final table should focus on scope conditions. Nothing in the world is true all the time, especially not the things we discover today. It is essential to determine the conditions under which your claim holds. If it is indeed knowledge that drives your effect, for example, the effect might be weaker for people who already possess that knowledge. This is because your intervention doesn’t add anything new. You can address this by providing interaction effects, showing that your effect is stronger in some conditions than in others.

In the next section, we will discuss these ideas in the context of a standard regression framework, such as linear regression.