Trust and Fake Data

It seems like a logical step to use real data to conduct tests and develop new products and services. But production data is often hard to come by, and GDPR has just made that even harder.

Large data-driven tech companies such as Facebook and Google start from a better place than most. In fact they have a distinct competitive advantage, as they already have large quantities of user-submitted data.

As data volumes tirelessly grow, many organisations are tasked with developing Big Data solutions. These new solutions all need to be tested and their results validated. But how do you do this without crossing over privacy barriers? 

In the same way that scientists produce synthetic material to run low risk experiments, we can produce computer-generated Synthetic Data which mimics real data.

From an organisational perspective, this synthetic or ‘fake’ data can be used to overcome gaps in knowledge while avoiding any breaches of consumer trust. It can provide insights based on trend analysis and the observation of marginal changes. This assists in supporting an organisation’s econometric function as well as in the development of new products and services.

"Synthetic Data is information that's artificially manufactured rather than generated by real-world events. Synthetic Data is created algorithmically, and it is used as a stand-in for test datasets of production or operational data, to validate mathematical models and, increasingly, to train machine learning models."
Techtarget: Search CIO/ Synthetic Data

SynthLab

So how might this work in practice? Wallscope and Edinburgh University Business School have recently developed a sampling technique that can be used for spatial and temporal health and care data. Due to the sensitive nature of patient information, using Synthetic Data means realistic datasets can be released without the risk of identifying individuals. This technique can be applied in other areas with sensitive or regulated data, such as financial services.

SynthLab was developed as an Open Source project. If you have an interest in this project or would like to find out more about Synthetic Data, please contact david.eccles@wallscope.co.uk

Tags

Synthetic Data is information that's artificially manufactured rather than generated by real-world events. It can be used to overcome gaps in knowledge while avoiding any breaches of consumer trust.

Business transaction