What impact do the copyright industry’s robot lawyers have on freedom of expression? And in a complex ecosystem where algorithm operators try to avoid scrutiny, how can we study their effects?

Automated law enforcement systems have become pervasive in just a few years. While many content moderation algorithms monitor and remove content, copyright enforcement bots are unusual for carrying the full weight of the law. In 2018, bots controlled by private enforcement firms sent over 700 million take-down notices under the Digital Millennium Copyright Act (DMCA), accusations that could result in tens of thousands of dollars in damages if challenged.

This has policymakers and legal scholars worried. Laws like the DMCA are designed to enforce copyright law at scale and deter copyright infringement. But influential laws can also have side effects. Law professor Frederick Schauer described this idea in 1978: “a chilling effect occurs when individuals seeking to engage in activity protected by the first amendment are deterred from doing so by governmental regulations not specifically directed at that protected activity.” 

even if law enforcement algorithms made perfect decisions, they might still have a chilling effect on fundamental rights

We live in a world where software monitors billions of people’s conversations to enforce laws in real-time. Even if law enforcement algorithms made perfect decisions (and they don’t), they might still have a chilling effect on fundamental rights. Given the massive scale of automated copyright enforcement, even small deterrent effects could cumulatively prevent millions of Americans from participating fully in civic discourse, causing profound damage to the public sphere.

Do copyright take-downs chill freedom of expression? To find out, we collected public tweets by 9,818 Twitter accounts that received a DMCA take-down notice in January and Febuary 2020. We then conducted an interrupted time-series analysis to study changes in people’s Twitter behavior after their content was taken down by Twitter. In this post, we report preliminary results and describe how we’re working toward an even more reliable answer (pre-print here).

Shining Light on The Outcomes of DMCA Takedowns

To study behavioral outcomes for people who experience automated law enforcement, we collected data from Harvard’s Lumen database, an archive that records every DMCA takedown request on Twitter, Google, Bing, and others. Our software regularly queried Lumen for the most recently posted DMCA take-down notices then checked Twitter for more information about the accounts mentioned and their last 3,200 tweets. Then, for the next 23 days, we recorded new public tweets by those accounts and whether they were suspended. After we removed accounts with non-US English, accounts that were suspended, accounts with more than one DMCA notice, and accounts that were less than 23 days old, our final dataset included 5,171,111 Tweets from 9,818 Twitter accounts.

How can researchers make claims about the outcomes of receiving a copyright take-down notice without running an experiment? One option is to look at behavior before and after for every account—what statisticians call an “interrupted time-series” study—and estimate the change on average. To do this, we developed a random intercepts model to estimate the log-transformed number of tweets that a person sent on a given day before or after they received the DMCA notice (you can read the details in our pre-print). This multi-level model allows us to analyze the behavior of frequent posters together with accounts that tweet much less.

Chart: Do copyright take-downs chill freedom of expression? After legal notice, people tweet less and posting rates decline on average on US Twitter

On average, accounts that received a DMCA take-down notice reduced their number of tweets by 3.2 percent, a change that is statistically significant (p<0.0001). Although accounts increase their Twitter activity on average before receiving a take-down notice, their participation on Twitter starts to decline after receiving it. Before receiving a DMCA notice, people increase their number of tweets per day on average. But after receiving a DMCA notice, this trend shifts downward as people tweet less and less (p<0.0001). 

So, Do Copyright Enforcement Systems Chill Freedom of Expression?

Automated copyright enforcement under the DMCA’s “notice and takedown” scheme has long been criticized for alleged chilling effects on people’s rights and freedoms online. How does our study add to that debate?

Our preliminary study can’t answer that question because it doesn’t single out changes in protected speech versus copyright-infringing tweets (something we’re working on). But our study does show that people post less to Twitter after they receive take-down notices. This finding is consistent with other research that has shown that privacy concerns and personally targeted legal threats have chilling effects on a range of online activities, including people’s free expression, search, social media engagement, and content sharing.

policymakers need to evaluate the side-effects of automated law enforcement on constitutional and human rights

More widely, our study demonstrates the need for policymakers to evaluate the side-effects of automated law enforcement on constitutional and human rights. While scientists are rightly debating bias, noise, and errors by algorithms, we urgently need more attention (and caution) toward the social and behavioral impact of deploying these systems into society at scale. Especially where policymakers design legal systems to have deterrent effects, they have an obligation to test the behavioral outcomes of those policies. Software like ours can help third-party evaluators independently test the behavioral impacts of automated law enforcement.

Improving our Data and Analysis

While our findings are consistent with the hypothesis that DMCA enforcement deters future activity on Twitter, the analysis in our working paper does not fully demonstrate the presence of a chilling effect on protected speech. We recently started an improved followup study, and we invite your ideas and feedback.

Because our outcome variable is the number of tweets sent by Twitter accounts, we do not differentiate between the kinds of speech that people post on Twitter. It’s possible that the reductions in tweets are entirely reductions in illegal content sharing. They might also largely be a reduction in speech protected by the U.S. Constitution. In the next study, we are developing measures of “civically important” activity on Twitter related to public health, news sharing, and political discourse. If you have ideas for how to do this well, let us know.

Interrupted time-series studies have important limitations in their ability to support causal claims. Because we only make observations among people who received DMCA notices without comparing them to those who didn’t, we can’t completely rule out alternative explanations for the reduction. To conduct a more fully causal analysis of this question, we are conducting a follow-up that includes a comparison group. We plan to use this follow-up dataset for a matched-pair quasi-experiment.

We are publishing this post and our pre-print for feedback and ideas for this follow-up study. Please send suggestions via Twitter or email to J. Nathan Matias (@natematias)(nathan.matias@cornell.edu) and Jon Penney (@jon_penney)(jon@citizenlab.ca).

How to Cite This Pre-Print

Matias, J. N., Mou, M. E., Penney, J., & Klein, M. (2020). Do Automated Legal Threats Reduce Freedom of Expression Online? Preliminary Results from a Natural Experiment. https://osf.io/nc7e2/


This project was initially imagined by Merry Ember Mou, who also implemented an early prototype of the data collection software in 2017. The data collection system was completed and maintained by Max Klein, who also conducted data preparation for this study. This research was funded by the AI Ethics & Governance Initiative. We are grateful to the Lumen Database for granting us access to this data.

Suggested Reading