Research Computing

Now that we have learned to set up the writing process for our paper, it is time to focus on the empirical analysis. Broadly speaking, your task is to transform raw data—acquired through either secondary or primary sources—into a set of tables that support the points you want to make in your paper. Essentially, you are using your data to analyze and present results that make claims about how the world works. In this section, we will learn how to set up your research computing system. Fortunately, you probably don’t need an extremely high-end computer to establish a suitable research computing platform. However, I recommend purchasing a decent quality Macintosh computer, such as a MacBook Air that costs between $600 and $1200, which can serve as your research computer for many years. Alternatively, you can consider a low-cost PC or a Linux computer, but in my opinion, a Macintosh is excellent for research computing for the average strategy researcher.

I understand that this advice may conflict with my previous recommendations on dictation. Currently, I use two computers: my dictation computer, as described in the previous section, and my research computer. Below, I provide the specifications for what I use, but I have used more affordable computers in the past that worked relatively well for most tasks.

Now that you have your computer, the next step is setting up the software.

I conduct analyses using two software programs. For the majority of my research projects, I use STATA. Although STATA is relatively expensive, it is the industry standard for most economics and strategy researchers.

You can purchase a low-cost version of STATA here:

The second software I use for more specialized analyses is R. I find R incredibly useful for conducting analyses using cutting-edge methods from computer science and statistics. For example, text analysis is performed significantly better in R than in STATA. However, you can accomplish most tasks in R that you can in STATA. This means that many young researchers who cannot afford a copy of STATA can still complete most tasks using R.

Once you have downloaded these two software programs, we can begin building your research system.

We will now return to our Dropbox folder system. If you recall, my standard practice includes the following folders in Dropbox: canonical, code, derived, figures, tables, and notes.

Your research system takes inputs from some folders and creates output in other folders. Specifically, your research system comprises code (stored in the code directory) that transforms the raw data stored in the canonical directory and converts it into figures and tables stored in the figures and tables directories.

Regardless of whether you are using R or STATA, your main task is to write code that transforms raw data into tables and figures. It is as simple as that. The tables and figures must collectively convince your reader of the claims you are making in your paper.

You can think of the code as consisting of three main parts:

  1. createData.R/do: Transforms your raw data into the derived format used to conduct your analyses.
  2. createTables.R/do: Takes your derived data as input and conducts analyses that will eventually be included in your final tables for the paper, both for your main manuscript and the appendix.
  3. createFigures.R/do: Takes your derived data as input and conducts analyses that will eventually be included in your final figures for the paper, both for your manuscript and your appendix.

Naturally, each file can be quite complex, and in some cases, you may need to break up these three files into additional code files. However, for most medium-sized projects, three files should suffice. It is essential to write code files as modularly as possible, so you don’t have to run all the code every time you need to modify your analyses. Nonetheless, anyone examining your code from scratch should be able to reconstruct your analysis from the raw data to your final tables. This is increasingly crucial, as many journals now require you to submit your code along with your manuscript or mandate code submission as a condition for publication.

In the following sections, we will explore how to write code for both STATA and R, and I will provide templates for each of these files to help you get started quickly.