A mixed methods approach analyzing emergent major themes in data privacy and data protection in the United States.
Purpose: Use R Studio, web scraping, and quantitative text analysis tools to illuminate major themes in data privacy, both via policy making and media coverage in the United States, as well as compared to a major precedent from the EU, the General Data Protection Regulation (GDPR). To do this, I examined five enacted state laws and one piece of federal draft legislation in the U.S., as well as three existing and proposed federal laws specifically governing children’s data privacy. I then analyzed these texts alongside GDPR and a selection of BBC News coverage.
Research Question: What can we find out about the state of data privacy and protection in the U.S. from state bills and media coverage? Specifically, what major themes have arisen from those selected source materials, the full texts of state and federal data privacy legislation and recent media articles covering data privacy? Are there any overlapping or diverging themes, or elements that are surprising in the context of my previous literature review findings?
Results: A comparative study of the topic models produced for each of the legislation texts as well as sentiment analysis on the BBC media coverage. I interpreted my initial findings with the policy-focused lens gained from a literature review of recent scholarship on data privacy legislation and policy.
In terms of state laws, I could determine that they did show cohesive themes and language despite their various differences. Furthermore, as I looked into certain words that the topic models highlighted, the focus of all the state laws appeared to be on the processing of data by businesses and the transparency around or availability of data to individuals. I was also able to pull out specific fear-based emotions from the BBC coverage, highlighting an interesting dynamic that was backed up by my literature review. Each specific set of analyses is explained in detail in the full report below.
Notes: The results of the LDA topic modeling and sentiment analysis were helpful as a thematic guide as well as both a validation tool for my previous qualitative research and as a directional tool for future research. However, topic models did not serve as a definitive mechanism to answer the research questions. I determined that this type of analysis should be used simply to pull out interesting themes, some of which would then require deeper investigation and exploration by the researcher to validate their possible significance to the broader research question. That is how I decided to approach these results.
Methods: Data collection and web scraping; Corpus creation (multiple texts); Quantitative text analysis methods: Latent Dirichlet Allocation (DA) topic modeling, Sentiment Analysis; Qualitative interpretation
Skills: Data cleaning, Data management and analysis
Tools Used: R Studio, Excel, GitHub
Learning Outcomes: Research, Technology, Critical Perspectives
Table of Contents:
Pages 1: Introduction
Pages 2-5: Background Research
Pages 5-7: Methodology
Pages 7-9: Results (State laws)
Pages 10-11: Results (Federal laws)
Pages 12-13: Results (Child-focused laws)
Pages 13-15: Results (BBC media)
Pages 16-17: Conclusion & Further Study
Page 18: Appendix