Blog icon

The challenge

Accessing user data safely

More and more, organisations are collecting data about their users and customers. This data is then fed into sophisticated analytics, including machine learning algorithms, to unlock insightful information leading to higher value services and products.

The question is how organisations can then provide safe access to this data internally, or even share the data externally for societal or commercial benefit. This is extended by considering the benefit of different organisations safely sharing data between them, and there is a strong incentive to do so.

Most data custodians recognise the privacy and confidentiality risks in using and sharing their data both within and outside their organisations. However, there is no consistent and repeatable methodology or related tool for data custodians to confidently measure and understand the level of such risks in their data for the purpose of sharing or releasing it.

Our response

Re-identifier Risk Ready Reckoner (R4)

Dashboard results from running ISP’s Re-identification Risk Ready Reckoner (R4) on a publicly available census data set (i.e. the “adult” dataset from the UCI Machine Learning Repository at https://archive.ics.uci.edu/ml/datasets/adult)

We have designed quantitative and qualitative privacy and confidentiality risk methodology, with appropriate assessment metrics and frameworks, to understand the risks with sharing or releasing data, or even just providing access to a wider internal audience. These tools leverage scientific knowledge from information theory and stochastic models to provide an accurate estimation of the residual risks associated with the sharing of sensitive data.

For example, one of our metrics allows the measurement of re-identification risks for an individual event, or transaction based ion factors such as uniqueness, uniformity and/or linkability. Another one of our metrics quantifies the risk of deducing a non-reported value in an aggregated data report.

We have also developed software, such as our Re-identifier Risk Ready Reckoner (R4), to implement these metrics and methodologies. R4 generates quantifiable risk assessments that display on a working dashboard - and provides data treatment options such as binning and perturbation to help data custodians mitigate these risks - before re-assessing the risk in the treated data.

The results

Improving awareness of privacy and confidentiality risk

Our work is improving awareness of privacy and confidentiality risk in data and helping in the management of that risk across the data ecosystem.

Our privacy and confidentiality risk frameworks and R4 software have been used extensively in several commercial engagements, identifying and measuring re-identification risks in so-called de-identified data pending release (or in some cases already released), as well as inference risks of not-reported data in confidential financial reports.

Demonstrating the impact of our work through these engagements, we have observed cases where data custodians have adjusted their approach to making data available due to better appreciation of the risk it carries. In other cases, guided by our framework, data custodians have applied targeted transformation to the data to reduce the residual risks - while still maintaining an acceptable level of utility - before releasing it.

Find out more: Information Security and Privacy

Contact us

Find out how we can help you and your business. Get in touch using the form below and our experts will get in contact soon!

CSIRO will handle your personal information in accordance with the Privacy Act 1988 (Cth) and our Privacy Policy.


First name must be filled in

Surname must be filled in

I am representing *

Please choose an option

Please provide a subject for the enquriy

0 / 100

We'll need to know what you want to contact us about so we can give you an answer

0 / 1900

You shouldn't be able to see this field. Please try again and leave the field blank.