One of the exciting recent developments in Machine Learning and AI is the success of Deep Learning algorithms to utilize big data towards solving some of the most emerging and important problems we face today.
This success, though, comes with a price. Often, machine learned models exploit information generated and owned by individuals that contains sensitive, private, copyrighted material.
It is then an emergent question how to harness these learning machines in a fair setting that enables the exploitation of the data, while not breaching the individual's rights.
In this proposal we focus on the question of generating synthetic data. We consider a setting where a learner has access to publicly available data, and may exploit it for fair-use to generate similar data, but the learner is not allowed to copy or to breach intellectual property (IP) rights.
The first challenge we face is to correctly model the question of IP-rights and fair use. In particular, the project will aim to propose different notions and criteria for fair-use as well as study their statistical and computational complexity.