Summer Institute in Computational Social Science in Helsinki, Finland

This summer, I attended the Summer Institute in Computational Social Science in Helsinki, Finland. Here’s a look back at two unforgettable weeks. Scroll down to see the resulting app!

Ethics

  • Just because some data are public does not mean you can use them in your research. Weigh the risk and benefits of using public data. Consult your ethics board.

Twitter, Facebook, & Co.

  • That viral tweet might not be so viral. Instead of mimicking the spread of a virus (right side of the figure below), tweets tend to be broadcast (left). Figure courtesy of Goel, Anderson, Hofman, & Watts (2016).

    image

  • People use Facebook and twitter differently. Depending on which data you use, you will arrive at different conclusions (Alhabash & Ma, 2017). Oh, and people also call, text, and meet up.

Semantic Analysis

  • I like visualizations. A lot. Courtesy of Shirakawa (2015).

    image

  • Struggling to decide on the number of topics in your topic model? Structural topic modeling handles this for you by picking the model with the highest harmonic log-likelihood.

The Power of Combining Data Sets

Survey Sampling

  • Wiki surveys allow hurried people to leave their contribution quickly and engaged users to contribute a lot. Figure courtesy of Salganik & Levy (2015). Check out allourideas.org for a tool that implements this principle.

    image

  • We can (to some extent) adjust for imbalanced non-probability sampling with post-stratification. E.g., in a linear model in R, simply include weights=weights in your call to lm, where the weights is a vector in your data that specifies the weight of each row.

On Causality

  • Argument: causality is not required for policy decisions. We do not need to understand why the weather forecast predicts rain to decide whether we will take an umbrella. Counter-argument: we should be hesitant to recommend policy decisions that we do not understand. Google’s image tagging algorithms appeared to work fine until it made racist headlines (Kasperkevic, 2015).

On Algorithmic Bias

  • Algorithms that are blind to ethnicity do not exist (predictions will often be comparable whether ethnicity is included or not). Rather, control for ethnicity and then see what explains the remaining variance.

On Möllki, Sauna, and Kajaking

  • The Finnish Way.

On Amazing People

A big thank you to the organizers in Finland, Matti Nelimarkka, Pihla Toivanen, and Juho Pääkkönen, as well as the coordinators of the summer institute Matthew Salganik and Chris Bail. And to wonderful friends.

To Be Continued

We (Hannes Rosenbusch, Ilse Pit, and me) continued working on our project after the summer school. You can check out a preview here. Funny enough, Hannes and I had met a year earlier on another summer school (Decisions, Laws, and the Probability of Big Data, in Haifa, Israel), but only got to know each other now.

image

References

  1. Goel, S., Anderson, A., Hofman, J., & Watts, D. J. (2016). The structural virality of online diffusion. Management Science, 62(1), 180–196. https://doi.org/10.1287/mnsc.2015.2158
  2. Alhabash, S., & Ma, M. (2017). A tale of four platforms: motivations and uses of Facebook, Twitter, Instagram, and Snapchat among college students. Social Media + Society, 3(1), 1–13. https://doi.org/10.1177/2056305117691544
  3. Shirakawa, M. (2015). N-gram IDF: A global term weighting scheme based on information distance. 24th International World Wide Web Conference. Retrieved from https://www.slideshare.net/MasumiShirakawa/www-48698138
  4. Salganik, M. J., & Levy, K. E. C. (2015). Wiki surveys: Open and quantifiable social data collection. PLoS ONE, 10(5), 1–17. https://doi.org/10.1371/journal.pone.0123483
  5. Kasperkevic, J. (2015). Google says sorry for racist auto-tag in photo app. The Guardian. Retrieved from https://www.theguardian.com/technology/2015/jul/01/google-sorry-racist-auto-tag-photo-app