Setting up your project directory

Setting up your project directory

Now that you have all the relevant software required for your research system, let us begin with the work. The core of every research paper is referred to as a “project.” What exactly is a project? A project is the work that you must perform to answer a research question. At the end of the project, your aim is to generate two summary documents that illustrate what you have learned from your research. These summary documents are the research paper and the research presentation. If your work is empirical, you will need to conduct data analysis and create summaries of that analysis in tables and figures. These figures are generated by transforming raw data, sometimes referred to as canonical data, into derived data sets that can be analyzed using statistical methods frequently written in the coding language of R or Stata.

In recent years, I have followed a standard pattern to organize my research projects. I use Dropbox (sometimes Box, when the data sensitivity requires me to keep it in University-protected cloud space) to store my project files and collaborate with co-authors by using the sharing capabilities of these two cloud drives.

If you still need to create a Dropbox account, I recommend doing so. Although Dropbox used to be free, it is not anymore. I subscribe to a Dropbox Plus account, providing 2 TB of data and unlimited device linking.

Once an account is created, it is recommended to create a main folder where all project folders can be stored in one place. The main folder can be named anything; for example, a common name for the main folder is “papers in progress.”

Next, open this folder which will be blank, and begin by adding your first project folder. Let’s name it “strategy research course.” (note the lower case, it makes it easier to reference in code, because everything is lowercase)

I recommend creating five subdirectories inside the “strategy research course” folder, which is an example of a project folder. These subdirectories each serve a specific focus in the research process.

canonical — The folder called “canonical” stores all of the raw data used for research, in its rawest form. This could be the original data dump provided by a vendor, a CSV file with raw survey responses, or XML files scraped from an online platform. Essentially, this folder contains the data set without any modifications by the researcher. The folder is called “canonical” due to idiosyncratic reasons shared by one of my collaborators in the past. It could be named anything, but “canonical” has a more sophisticated ring.

code — This folder stores all the code files that transform the canonical data into intermediate data sets that will ultimately be saved in the “derived” folder. I highly recommend having multiple modular code files that perform different functions. We will go into this more deeply in the data and analysis chapter of this guide. However, one code file should focus exclusively on converting the raw data in canonical into the derived data sets that you will use to run your statistical analyses and produce your tables and figures.

derived — This folder stores all the derived data sets that will be used for statistical analyses, forming the basis of your results section in your research paper. If you are using R, you can store the .Rdata files and sometimes CSV files built with the raw data sets in this directory. When using Stata, you can store the .DTA files in the derived directory, unless the .DTA files were the original files downloaded from a source such as ICPSR or NBER.

tables – This folder stores all the tables that you will include in your Overleaf document for the research paper. Both R and Stata have high-quality functions that can export publication quality tables into .tex format, which can then be included in your Overleaf’s latex file. In Stata, you can use eststo, while in R, you can use stargazer.

figures — This folder stores all the final figures that will be included in your Overleaf document for the research paper. Both R and Stata have the ability to directly export to PDF or PostScript format, which can be easily included in your LaTeX document. I personally prefer exporting to PDF as it offers a higher print quality and allows readers to zoom in to see details if they are reading your manuscript in PDF form.

Once you have created all these directories, you can work on setting up a similar structure for your Overleaf document.