Five Takeaways from PEPR 2023by Josh Schwartz and Abby Krishnan
Last week our team attended the Privacy Engineering Practice and Respect (PEPR) conference, the leading US conference on privacy engineering, bridging the gap between academia and industry. There were so many amazing speakers, it’s hard to do all of the talks justice with a summary, but below we wanted to highlight a few themes that stood out to us.
Data maps are just the beginning
Once a company has a map of all of their data, the next step is figuring out how to actually manage it. A talk from Viraj Thakur (Cruise) went deep into access management for BigQuery, showcasing Cruise’s system for flagging users who potentially shouldn’t have access to certain pieces of data. Akshatha Gangadharaiah (Snap) talked through how they operationalize getting teams to hook up their systems to their DSAR pipeline.
Our overall takeaway was that as organizations scale in data and personnel, a quality, detailed data map is a requirement for any system for managing personal data, but also that teams need to remember that a data map is just the beginning of a data governance program.
Cookies aren’t a solved problem
Several talks on cookie and third party code management underscored that there’s still a lot to be done. Katriel Cohn-Gordon (Meta) presented Meta’s work on defining a code-level schema for all cookies set on a site, which has both improved visibility into data collected via cookies and improved developer experience for cookies at Meta.
Both Katriel and Devin Lundberg (Pinterest) talked through the idea of enforcing that only approved third-party code is even loaded onto a site by setting the site’s Content Security Policy to a specific allow list. While CSPs were developed with XSS protection in mind, many privacy violations have occurred through sites loading third-party scripts without properly reflecting those third parties in their privacy policies, so the idea of putting a technical measure in place to prevent the loading of third party scripts should be quite enticing. Doing this requires careful management to ensure that a site’s third-party dependencies don’t run into issues.
Generative AI: a major gulf between privacy experts and what’s happening in practice
As expected, the session on Generative AI was one of the liveliest and best attended sessions of the conference. Perhaps more than any other point, what stood out most from the discussion was the advice from the panel (Jay Averitt, Apoorvaa Deshpande, Sameera Ghayyur, Christian Lau, Hunter Luthi, and Eric Wallace) to avoid training LLMs on customer data, as currently-available systems aren’t able to guarantee user privacy. This stands in contrast to what many generative AI companies are actually doing, portending the conflict between privacy advocates and the rapid pace of development in generative AI.
Data anonymization is misunderstood
Katharina Koerner (Tech Diplomacy Network) gave a great talk on what in my experience is one of the most misunderstood topics in privacy — anonymization. All too often, teams take steps like hashing to “anonymize” data, when these steps may in fact only pseudonymize data, and pseudonymous data that can be reidentified is generally required to have the same data protections as non-anonymized data! In addition to highlighting this difference, Katharina pointed out that even the law itself can be confused, with some laws using “de-identified” while others use “anonymized” or “aggregated,” and each term carrying its own ambiguity. Our takeaway was that teams relying on anonymization as part of their data protection efforts need to work carefully with legal teams to understand what’s actually legally required of their work.
Metrics
Finally, a great talk from Lea Kissner (Lacework) highlighted all of the ways in which metrics can be used to drive results differently from how they were intended. As someone who spent 10+ years working on analytics products, the importance of understanding the incentives pushed by different metrics rang deeply true. As privacy teams push to justify their initiatives and budgets, the call for metrics to quantify the impact of teams’ work will only rise, and Lea’s talk gave some meaningful suggestions. The point that resonated most for me was the value of balancing one metric with another — for example balancing a “volume” metric with a “quality” metric — to avoid over-optimizing for one dimension.