Slides van de webinar voor Devoxx op 12/10/2022
Using ‘real’ data may be tempting, yet under the GDPR it’s not a good idea when dealing with personal information. Unfortunately, testing or debugging software may be harder without having full access to all underlying data. A synthetic dataset can be a good solution: generating fictitious replacement data, that mimics the structure and distribution of the original data. Joachim Ganseman from Smals Research talks about how synthetic data can be generated, and especially about the practical concerns and limitations. How do we deal with rarely occurring values, correlations or dependencies? What about the balance between maximum privacy protection vs. retaining enough functional usability? Can we do reliable analytics on a synthetic dataset? He will share some practical examples using open source software in Python.
Video recording published on YouTube