What new research, ideas, and technologies do Trust and Safety Professionals think could make a big difference for the work of digital governance?

Here at TrustCon the first Global gathering of Trust and Safety Professionals, I’m liveblogging the Lightning Talk session, which presents work from the policy teams, lawyers, and engineers at the forefront of the daily work of digital governance. Here are the ideas they presented:

What does TikTok do to create a safe/creative space for people to have deep conversations about mental health? Speaking is Ryn Linthicum, Product Policy Manager at ByteDance, focusing on on mental well-being and suicide prevention. Ryn summarizes Tiktok’s approach to supporting conversations while also protecting communities from harmful content. She describes the range of resources, and guidelines, supported by the company’s algorithms and safety teams. Ryn tells us us about TikTok’s campaign on World Mental Health Day to let people know about mental health resources. She also describes well-being guides that the company is providing to its users, and the work the company did, in collaboration with clinical psychologists, to listen to people about their needs and translate the guide into multiple languages. Building on this work, Ryn describes TikTok’s plans to host livestreams about mental health and workshops with creators.

How can engineers support Trust and Safety work by scaling and unifying moderation systems? Paul Sanders is Senior Engineering Manager for Trust and Safety at Sony Interactive Entertainment, working on the Playstation system. When he first came to Sony in 2010, every game title had its own moderation tool and some titles had multiple moderation software systems. All of these systems looked similar to moderators, but they had subtle differences between them that were dependent on the jurisdiction. Paul tells us that maintaining these systems was a major headache— leading to a situation where companies faced pressure to update some systems but not others. To address this issue, Paul contributed to a project to summarize the general challenge of content moderation— creating a process that could be adapted to different languages and jurisdictions, especially around CSAM. Based on that single process, moderators could go to a single tool to carry out moderation, escalate cases,— making it easier to train and support moderators. With unified data processing, Paul’s team was able to make automated recommendations to moderators for what intervention (such as temporary suspension) they should make in a given case, as well as revoke interventions. By automating the CSAM reporting form to the NCMEC, Paul’s team was able to reduce the submission time for regulatory compliance by a factor of ten (from 30 minutes to 3 minutes). 

Now Sony is using this system with over thirty titles across the PlayStation ecosystem, and the cost to moderate each case has decreased. It’s now easy to change a policy so that it applies across titles, as well as make it easier for new game titles to incorporate content moderation processes.

How can platforms interrupt extremist activity in encrypted platforms? Rafi Lazerson is a researcher at Berkerley who examines security and human rights issues related to emerging technologies. Rafi reports on a report from interviews with experts to understand why extremists use platforms and what count as effective interventions. Rafi starts by noting that extremists need to recruit new members while continuing to have the security needed to evade law enforcement. For this reason, extremist groups tend to reach for multiple platforms to meet these different needs. Facebook, for example, offers a high ability to recruit but a low ability to operate secretly. Signal has a high ability to operate secretly but a lower ability to recruit.

Rafi talks to us about what happens when a platform changes its features, capabilities, or policies. If a new feature or changes in a userbase make it easier for extremist groups to recruit new members, they might start using a new platform. He argues that a single platform may have different pros and cons for extremist groups. For example, if platforms are good at detecting hate speech in English then the platform will make it harder to recruit in English but not affect the platform’s suitability for extremism in other languages. Due to this issue, Rafi suggests that researchers should take a cross-platform lens when studying online extremism, supported by longer-term grants.

Is Meta actually changing due to what the Oversight Board is doing? Naomi Shiffman is Head of Data & Implementation at the Meta Oversight Board. She is also a Fellow at the Integrity Institute, and serves as an advisor to Connect Humanity, a fund for digital equity.

Naomi opens up by telling us about the Oversight Board, and its role to review specific decisions made by Meta and make recommendations to the company. She talks about the “Reclaiming Arabic Words,” decision, where the company updated its word lists for content moderation in Arabic within 60 days. The Data & Implementation Team has three purposes: (a) inform recommendations with information about the company’s roadmapping and priorities (b) Assessing how the company implemented recommendations and whether the changes influenced the information ecosystems, and (c) lay the groundwork for assessing the impact of the Oversight Board.

When Meta makes a decision in response to the Oversight Board, the team rates how comprehensive the decision was. They also rate what Meta says about whether it implemented the recommendations, and if so, what stage that response it is in (did they decline to act after a feasibility assessment? Are they making progress? etc). Thanks to this work, the Board is becoming much more specific in the recommendations they are making to Meta. Finally, based on information the Board is hearing about Meta’s priorities, they are able to target their recommendations to account for those priorities.

How has this played out in practice. Naomi tells us about increased specificity in user notifications, increased disclosure in Meta’s Transparency Center, and the creation of a new Misinformation Community Standard. 

What challenges do this work face? In the past, they have relied on CrowdTangle, which is now becoming less available. It’s also hard to attribute and assess whether Meta complies with recommendations, or to evaluate whether those recommendations are having the effect they hope for. Finally, it has been challenging to harmonize the company’s timelines with the Oversight Board’s.

Upcoming work will be affected by Meta assigning a data scientist to support their work, implementation feasibility ride-alongs for the Oversight Board members, and upcoming conversations about further data sharing.

How does Meta build working groups to inform product policy development at Meta?

Yvonne Lee is Stakeholder Engagement Manager at Meta and a fellow at the SNF Agora Institute at John’s Hopkins, where she works with academics and civic society organizations on issues surrounding misinformation and algorithmic ranking. Her job at Met is to convene expert working groups to inform how Meta creates its policies. Yvonne’s team convenes 60+ experts from 20+ countries across 6 experts groups, with membership made of a global set of academics, civil society organizations, and more. These teams work on issues including misinformation, online safety, and other issues. Meta’s groups are parallel to similar groups organized by other companies.

How can platforms create groups like this of their own? The first thing, says Yvonne, is to be clear about your goals. Once you have goals, Yvonne says, you need to be clear about membership, management, and meetings. She encourages Trust and Safety Teams to choose a common thread for groups- a regional focus, a subject matter, or even a particular area of expertise, while ensuring a diversity of perspectives. Groups should be well-managed, she says, to create safe and productive environments. She discusses compensation, publicity norms, trust-building, and virtual or in-person meetings. Finally, Yvonne reflects on facilitation techniques for these meetings.

How do South Korea’s creators, civil society, and government collaborate to regulate online content? 

Inyoung Cheong is a Ph.D. Candidate at the University of Washington School of Law and works for the Ministry of Culture, Sports, and Tourism of South Korea. Inyoung talks to us about industry self-regulation in the media, which has been less common in the US since the 1980s but more common in other countries such as Korea. Inyoung tells us that self-regulation in Korea is less influenced by the First Amendment, as well as antitrust laws that have prevented certain forms of industry cooperation. Inyoung tells us that the Constitutional Court of Korea has fielded more and more free speech challenges in recent years. In this environment, media industries have worked with governments to self-regulate to head off government regulation. She summarizes a triangle of government, creators, and platforms, where industries such as news, comics, film, and music collaborate on funding, rule-setting, and enforcement that is carried out by non-government actors. She tells us about the industry organization WebToon, which works with the KCSC (review)

How can predictive models help moderators prioritize which “bad tweets” to intervene on sooner? 

Maggie Engler is a Senior Machine Learning Engineer at Twitter, who develops tools and models for the detection and mitigation of spam, abuse, harassment, and misinformation on the platform. She tells us about a project to identify “pre-viral” bad tweets so that moderators can intervene. She summarizes a set of classifiers that the team tested in this project. The most important features were whether the user was verified, the number of followers, and the number of impressions typically accumulated by the user. Maggie reported that the system was especially accurate within the first hour after the tweet was made, so they created two models— one for the time of tweet, and another an hour after the tweet was published.

The final models were able to achieve 30% precision and 80% recall. The question is still open for how effective those interventions are at preventing the spread of viral, harmful tweets, but that’s work for further research.