Setting up your project directory
Now that you have all the relevant software needed for your research system let’s start with the work. The heart of every research paper is what you call a “project.” What is a project? A project is the “work” you must do to answer a research question. At the end of the project, your goal is to produce two summary documents about what you have learned from your research. The summary documents are — the research paper and the research presentation. If your work is empirical, you will need to do data analysis and then produce summaries of that analysis in tables and figures. These figures are produced through the transformation of raw data — sometimes called canonical data — into derived data sets that can then be analyzed using statistical methods often written in the coding language of R or Stata.
In recent years, I have followed a standard pattern for organizing each of my research projects. I use dropbox – sometimes box when the sensitivity of the data requires that I keep it in University-protected cloud space – to store my project files and collaborate with co-authors using the sharing capabilities of these two cloud drives.
If you haven’t already created a dropbox account, I recommend you do. While I recall that dropbox was free at some point, it is not. I subscribe to a dropbox plus account, which gives me 2 TB of data and unlimited device linking.
Once you have created an account, create a main folder where all your project folders are in one place. You can call this main folder anything you want; my main folder is called “papers in progress.”
Next, open this folder. It will be blank. You can begin by putting in your first actual project folder. Let’s call it a “strategy research course.”
I recommend creating five subdirectories inside the “strategy research course” folder — an example of a project folder. These subdirectories each serve a specific focus in the research process.
canonical — this folder stores all the raw data you will use for your research. This is the rawest form of data available to the researcher. For instance, this may be the data dump provided by the vendor you bought your data set from, the CSV from Qualtrics with the raw survey responses from your nationally representative survey, or the XML files you scraped from an extensive online platform. In other words, this is the data set without any adjustments or modifications by you as a researcher. The canonical folder is called such for idiosyncratic reasons that I learned from one of my collaborators many years ago. This folder could be called anything, maybe even “original data”; however, canonical sounds more sophisticated.
code — this folder stores all the code files that transform the canonical data into intermediate data sets (that will ultimately be saved in the “derived” folder). I would highly recommend having multiple modular code files that perform different functions. We will go into this much more deeply in this guide’s data and analysis chapter. However, one code file will focus exclusively on converting the raw data in canonical into the derived data sets on which you will run your statistical analyses and will use to produce your tables and figures.
derived — this folder stores all the derived data sets that will be used for the statistical analyses, ultimately forming the basis of your results section in your research paper. If you are using are, I often store the .Rdata files in this directory, sometimes CSV files built with the raw data sets, and when I use Stata, I store the .DTA files in the derived directory unless the .DTA files were the original files that I downloaded from some source like ICPSR or NBER.
tables – This folder stores all the tables that will go into your Overleaf document for the research paper. Both R and Stata have high-quality functions — eststo in Stata and stargazer in R — to export publication quality tables into .tex format that can be included in your Overleaf’s latex file.
figures — this folder stores all the final figures that will go into your overleaf document for the research paper. Both R and Stata can directly export to PDF or PostScript format in a way that can be included in your LaTeX document. I tend to export into PDF, as the print quality is higher, and the ability to zoom in to see details if someone is reading your manuscript in PDF form.
Once you have all these directories, we can work on setting up a similar structure on your overleaf document.