Privacy Preserving Visualizations using Vega-Lite
Document
Description
In today's data-driven world, privacy is a significant concern. It is crucial to preserve the privacy of sensitive information while visualizing data. This thesis aims to develop new techniques and software tools that support Vega-Lite visualizations while maintaining privacy. Vega-Lite is a visualization grammar based on Wilkinson's grammar of graphics. The project extends Vega-Lite to incorporate privacy algorithms such as k-anonymity, l-diversity, t-closeness, and differential privacy. This is done by using a unique multi-input loop module logic that generates combinations of attributes as a new anonymization method. Differential privacy is implemented by adding controlled noise (Laplace or Exponential) to the sensitive columns in the dataset. The user defines custom rules in the JSON schema, mentioning the privacy methods and the sensitive column. The schema is validated using Another JSON Validation library, and these rules help identify the anonymization techniques to be performed on the dataset before sending it back to the Vega-Lite visualization server. Multiple datasets satisfying the privacy requirements are generated, and their utility scores are provided so that the user can trade-off between privacy and utility on the datasets based on their requirements. The interface developed is user-friendly and intuitive and guides users in using it. It provides appropriate feedback on the privacy-preserving visualizations generated through various utility metrics. This application is helpful for technical or domain experts across multiple domains where privacy is a big concern, such as medical institutions, traffic and urban planning, financial institutions, educational records, and employer-employee relations. This project is novel as it provides a one-stop solution for privacy-preserving visualization. It works on open-source software, Vega-Lite, which several organizations and users use for business and educational purposes.