Ready to Enhance your Data Practices? Explore Synthetic Data Generation for Improved Data Sharing, Data Quality, Privacy, and Developer Productivity

Presentation 📣

-

English 🇬🇧

-

Thursday, September 07, 11:40 AM – 12:40 PM

Length: 60 minutes

Room: Room 3

Abstract

Generating Synthetic Cancer data is a critical innovation focus area at the Cancer Registry of Norway due to various benefits offered by synthetic data, such as easier data sharing for promoting cancer research, improved data quality for building better prediction models, better privacy protection for patients, and better developer productivity through easier software testing. In this talk, first, I focus on different synthetic cancer data generation techniques explored at the Cancer Registry for various use cases. Additionally, I share our view on why no synthetic data generation technique is best for all our use cases. Furthermore, I talk about evaluating the quality of synthetic data and the challenges in answering the following questions: a) How good is the generated synthetic data compared to the real data? b) How well does the generated synthetic data preserve patient privacy? c) Does the synthetic data satisfy the purpose for which it is generated? d) How much bias is introduced by synthetic data? Furthermore, I also talk about how federated learnings add more benefits to generating synthetic data and the new challenges that federated learning introduces regarding security, privacy, accountability and auditability. Finally, I will talk about our efforts towards addressing these challenges.

Day & time

Thursday, September 07, 11:40 AM – 12:40 PM

Intended audience

By attending this talk, the participants will gain a deeper understanding of the benefits and challenges associated with synthetic tabular data generation. They will learn about the different techniques used for synthetic data generation and how to evaluate the quality of generated data. The talk will also explore how federated learning benefits synthetic data generation and highlight the challenges associated with it and ways to address it. Data Scientists, Software Developers with a basic understanding of data science concepts will benefit the most from this talk. Familiarity with data privacy regulations and experience working with cancer data would be an added advantage but not necessary. The talk is designed to be informative and accessible to a wide range of participants, including researchers, data scientists, software developers and healthcare professionals interested in leverage synthetic data for their own use cases.

  • Narasimha Raghavan Veeeraragavan

    Narasimha Raghavan Veeraragavan is currently a Special Adviser with the Cancer Registry of Norway. He is a key player in delivering technical architecture and innovative solutions to continuously strengthen the security and privacy of cancer patients’ datasets in Norway. Additionally, as part of his role, he is involved in several national and international research projects and collaborates with many reputed national and international partners. Before, he led several technical initiatives in global companies. He has four patents and a few peer-reviewed research papers in reputed conferences and journals. His initiatives resulted in large-scale software products launched globally with millions of users worldwide. From 2017 to 2022, Narasimha was invited to teach courses at the Department of Informatics at the University of Oslo. The audience to his course where professional software developers working in public and private sectors who wanted to be architects. His course has received best rating from the participants.