Final Project, Literature Review

Step 1: Exploratory Data Analysis, Literature Review

In the final project, you will:

  1. Find strong data relationships in the class dataset.
  2. Research and write about what other people have written about how those variables relate.
  3. Write up your findings in a final paper.

Exploratory data analysis, in this case, is specifically looking for data relationships that are fairly strong in a dataset that was already collected. (Other times, people will test hypotheses that were not originally intended, but they will present their results as exploratory data analysis whether or not the results are significant.) You will look at some of the variables you are interested in, run tests about their relationships, and keep going until you find one (or more) analyses that have a p-value below 0.15 (15%).

The data come from you and your classmates and previous students filling out surveys related to their social networks, Facebook, and LinkedIn. They’re interesting and real, but not perfect. Some questions were asked in one year but not in another. Sometimes people left things blank or it was not clear which surveys match with each other. Sometimes, if a person had 4 or fewer LinkedIn contacts, the calculations for their other network attributes were too extreme to include in the overall dataset.

The analyses you do are going to be like the ones we did in Excel assignments 1 through 4. You can use Excel, or you can use other software you know (e.g. R) to do these analyses, as long as you clearly describe the outcomes and results in your presentation and paper, along with appropriate graphs.

  1. Data cleaning and preparation
    1. Keep track of which copies of the data you’ve messed around with. If in doubt, you can download the data from D2L again to know you have a clean copy.
    2. Creating scale variables
      1. In the previous assignments, you created a scale variable for bridging social capital. This final dataset has the data to create five scale variables: FB Intensity, Bonding SC, Bridging SC, Social Loneliness, and Emotional Loneliness.
      2. You can see which variables are associated with which scales on the Descriptions tab.
  • Scale variables are valuable because they even out some of the randomness of particular questions to put together the underlying construct.
    1. g., one person might have lots of Bonding Social Capital but doesn’t know anyone with much money, so they would have a lower answer for that one question. Yet if we look at the average across all the Bonding Social Capital questions, instead of just the loan question, we could understand them better.
  1. Watch out for trick (reverse-coded) questions! For instance, on the Bonding SC scale, the meaning ofnotimportant shows higher Bonding SC when someone answers 1 than when they answer 5.
    1. To deal with reverse-coded questions, create a new column, called something like rev_notimportant.
    2. Create a formula so that, within each row, rev_notimportant = 6 – notimportant
      1. g. if notimportant = 1, then rev_notimportant = 5
    3. Then, when you take the average to create a value for Bonding SC, it should include rev_notimportant instead of notimportant.
  2. For most of the analyses and graphs, you will need a dataset that doesn’t have missing data for the variables you’re interested in.
    1. Create a copy of the big dataset, and make sure you keep the identifier in column A!
    2. Sort the whole thing by one of the variables you need. All of the people for whom there is no answer for that variable will be grouped together, probably at the bottom.
  • Because this is a copy of the data, you can go ahead and delete those rows that are missing the data you need.
  1. Keep doing steps ii and iii until you have a dataset with no missing values for the variables you care about.
  1. Finding likely data relationships
    1. Like in the first Excel assignment, you can create a correlation matrix to see which variables tend to relate to which other variables. This might help you narrow down what you want to look at.
    2. Don’t compare variables that are part of the same scale to each other, or to the overall scale variable they’re part of, because those are closely related by design!
    3. Start to come up with hypotheses about why different variables might be related. This will help you come up with multiple related analyses.
    4. You could also look at a scatterplot of two variables you care about. Do the dots mostly make a straight line (great candidates for a regression analysis), or do you see a different pattern to investigate?
  2. Running analyses and making graphs
    1. You can present your hypotheses as being causal, even though the data do not allow us to show cause and effect.
    2. A really good paper will have more than one analysis.
      1. You could explore multiple related hypotheses.
        1. What is the effect of X on Y? What is the effect of X on Z?
        2. What is the effect of X on Y? What is the effect of Z on Y?
      2. You could add control variables to tell a more causal story.
        1. What is the effect of X on Y? What happens if you use X and Z together in the regression?
  • You could look for an interesting moderator of the relationship.
    1. What is the effect of X on Y? If we include a classifier (e.g. gender or being a senior) and an interaction term, do we see interesting results?
  1. For each analysis, you’re looking for a testable relationship, using a t-test or a regression, that has a p-value of 0.15 (15%) or lower.

Final Project, Literature Review

Step 2: Literature Review

In the final project, you will:

  1. Find strong data relationships in the class dataset.
  2. Research and write about what other people have written about how those variables relate.
  3. Write up your findings in a final paper.

A literature reviewsummarizes and synthesizes what other researchers have said about a particular topic before. In the final paper, it happens after you introduce the topic and define the most important terms, and then it leads into your hypothesis. That means that you should have a good idea of which two or three variables are going to be in your final analysis. (See steps below.)

For this class, your literature review will be 3-4 pages, double-spaced. You should find at least 5 scholarly articles that relate to your topic that you write about.  Two or three of them can be articles we read together in class. (Check the syllabus to see which papers we read for which topics.)  For the rest, search for related articles that deal with related topics. (See steps below.)

When you turn in the draft of this literature review, identify at the top (or in the title) what the key terms are that you will define in the introduction. These should be the two or three variables your analysis is about, and any scales they refer to. For instance, you could use the title

Literature review on bridging social capital, time spent on Facebook, and Facebook community (part of the Facebook intensity scale)

Or you could just identify at the top:

Terms to define in introduction: bridging social capital, time spent on Facebook, Facebook community (part of Facebook intensity scale)


Here are the key steps for writing a literature review:

  1. Know which variables you’re studying: Go back to the Exploratory Data Analysis step and make sure you have found several variables that are strongly connected to each other.
    1. For an A paper, you need at least two analyses, which requires at least three variables. Otherwise, you just need one analysis, linking two variables.
      1. For instance, you might first show in a regression that X predicts Y (p < 0.15), and then show in another regression that when you also put Z into the regression, so that X and Z together are predicting Y, the coefficient on Z is significant (p < 0.15).
        1. If the coefficient on X is still significant, then you have found that X is related to Y even controlling for Z.
        2. If the coefficient on X is no longer significant, then Z might be the true mechanism behind why X is related to Y.
      2. Once you THINK you have your two or three variables, create a clean dataset that is based on just those variables and check that the statistical relationships are still strong (p < 0.15).
        1. Start from the whole raw dataset and keep ONLY the ids and those variables. (If some of them are scale variables, keep the variables that feed into those scales.)
        2. Delete any row of data that has a blank for any of those variables.
  • Run the analyses again to make sure they hold up.
  1. If they still work, you’ve got your variables! Otherwise, try another data relationship till you’ve got variables that work.
  1. Find articles related to the relationships between these variables.
    1. Identify which articles from class relate to these variables.
      1. If there’s an article that goes into depth on one variable, but not any of your other variables, that might still be helpful to use.
      2. If there’s an article from class that compares the same variables you do, you should almost certainly refer to it!
    2. Find articles related to the articles from class.
      1. Zoom in on the literature review sections of the class readings. Which articles do they mention that relate closely to your variables? Plan to look those up.
      2. The bibliography and endnotes in Networked are bursting with interesting papers. There’s a lot there, but it could point you to another interesting source.
  • Use Google Scholar to look up which articles cited an article from class.
  1. Looking up articles you know about and finding ones you don’t: Google Scholar.
    1. There are other literature databases you could use, on the Library Tools tab of the course D2L page. (There’s also a tutorial on using Google Scholar.)
    2. But start with
  • If you already know a paper’s title, type it in quotation marks, or you can type the authors’ names or key (fairly unique) terms.

Figure 1. Identifying the “Cited by” link and article link on a Google Scholar entry.

  1. You can retrieve the article by clicking the link to the right.
    1. If there are paywalls, then there may be a link labeled “Full-Text @ UofA Library,” that you can use on campus or with a UA VPN.
    2. If you click “All X versions,” there might be a link to a pre-print or one of the author’s websites that hosts it with open access.
  2. You can find the articles since then that use this same article by clicking on “Cited by X.”
    1. You can narrow those results by how recent they are, or by searching for another one of your key terms among that set of articles.

Figure 2. Search for a key term within the articles that cite an article you like.

  1. You can also start a new search, looking for which articles use two (or three) of your variables. Perhaps you’d use “Facebook” or “LinkedIn” or “SNS” as one of your search terms, too.

Figure 3. Search on Google Scholar for two key terms, each in quotation marks, connected by Boolean operator “and.”

  1. What do you do when you find good articles?
    1. Download or bookmark the article and write down the citation at the end of your draft.
      1. If you work off-campus a lot, you might want to download instead of bookmarking.
    2. Read the abstracts of the articles that looked good, and categorize which ones seem most related to the project.
    3. Take notes, starting with the articles that are most related.
      1. For each one, summarize the findings. You might want to use the Literature Summary template from the beginning of the semester. How would you explain what the authors were asking (research question/hypothesis) and what they did to find out (method of analysis) to someone who hadn’t read the article?
      2. Also write down what’s most relevant to your own analysis. Do they use a variable you’re interested in as a control variable, or in a way that’s not part of their main finding? Do they suggest that further research is needed about your variables?
  • In all these notes, use quotation marks and a page number if you’re writing down direct quotations.
  1. In your literature review itself, you should only use direct quotations if they’re fairly short phrases that explain a point really powerfully—and be sure to put them in quotation marks, followed by a citation (author year, p. XYZ).
  1. Do you have notes on five articles yet? If not, search some more or see whether the slightly-less related articles will work.
  1. Writing it up.
    1. You have five or more articles that you have notes on. Figure out how they relate to each other.
      1. Do the articles directly talk about each other? Figure out and tell the story of the development of the ideas, or maybe the feud between the authors, probably starting with the oldest first.
      2. Do you see other ways to compare and contrast the articles?
        1. Theory (e.g. broad articles like Coleman (1988)) vs. practice (specific experiments in a narrowly defined context)
        2. Older and newer articles (see whether the effect changes over time)
        3. Offline vs. online, or one context or platform vs. another
        4. Country the article is performed in – there might be very different social contexts and expectations.
      3. Once you have your structure, introduce it and then summarize the articles in that order.
        1. For example, part of your opening paragraph of the literature review might be: “Snoopy (1952) puts forward the idea that pet dogs can have lives just as interesting as their owners. Santa’s Little Helper (1989) and Brian Griffin (2004) explore how this idea works when the dogs are far dumber or far smarter, respectively, than their owners.”
        2. As you go through, keep your own hypothesis in mind. You’re trying to tell the story of how these ideas developed over time and how they bring you to ask your question.
  • The articles might disagree with each other. Make it clear if that’s the case. E.g. “Odie (1979) came to radically different conclusions than Snoopy (1952), demonstrating how dogs could have goldfish-like memories and live in an eternal present. This contrasts with the elaborate planning and cognitive ability that Snoopy presents. Instead, Odie investigates whether…”
  1. At the end of your literature review, close with a paragraph that neatly summarizes the articles you read and leads into your own hypothesis that your analysis addresses. For instance, you could conclude: “Studies of friendships appear to reach different conclusions about the importance of kittens to friendship networks, depending on whether the study is about in-person friendships or about “friendships” or “connections” on social networking services like Facebook or LinkedIn. Garfield (1979) finds that kittens make people grumpy and hurt friendships, just as Felix (1955) finds that cats cause mischief and strain in-person relationships. Yet GrumpyCat (2014) and Maru (2010) convincingly argued that online communities are strengthened when there are more kittens present. It seems like the particular medium changes how cats affect friendship. Therefore, I ask the question: do cats increase a person’s network centrality, and is this effect moderated by whether it is on Facebook or LinkedIn?”

Spread The Word