SAGE Ocean announces Text Wash as 2019 Concept Grant winner

Over $30,000 awarded to develop smart anonymization tool that enables social scientists to access untapped textual datasets

SAGE Ocean announced today that it has awarded a Concept Grant to Text Wash, a new software tool that anonymizes personally identifiable text data, making it accessible to social scientists without compromising its usability for research.

When it comes to doing research with text data, many datasets are protected through ethics boards’ restrictions (e.g. interviews, crowdsourced texts) and wider data protection frameworks such as GDPR (e.g. police reports, patient files). As a result, such unique datasets are rarely shared, so that research using text data often focuses on readily available data at the expense of data that could help answer more pressing research questions.

Where they are shared, current approaches to anonymize these data render the texts unusable for follow-up research. Text Wash solves this problem by enabling the anonymization of text data without compromising its quality. It does this by using natural language processing and machine learning to identify and replace sensitive information while preserving the semantic and grammatical structures in text. Importantly, personally identifiable information is determined in close collaboration with data protection officers from the government and the police.

Text Wash is being developed by Dr Bennett KleinbergMaximilian Mozes and Dr Toby Davies from the Department of Security and Crime Science at University College London, UK. SAGE’s Concept Grant will enable the team to get the tool off the ground and promote ethical and intelligent data sharing practices. Text Wash will be available as an R-package and as an easy-to-use standalone software for non-technical users. For more information, contact: bennett.kleinberg@ucl.ac.uk

Bennett Kleinberg, Assistant Professor in Data Science at the Department of Security and Crime Science at UCL said;

“Data sharing is one of the main impediments to truly relevant computational social science research. Our aim is to unlock the potential of hard-to-access text data – such as police reports or patient interviews – as a means of addressing important societal challenges. The idea for Text Wash came from the observation that many organisations are, in principle, willing to share raw text data for research purposes but are reluctant to do so due to data protection issues. We are excited to put our ideas into practice with this concept grant. Ultimately we hope that we can open up access to a yet-untapped treasure of data to make research more relevant.”

Katie Metzler, Associate Vice President of Product Innovation at SAGE said;

“It is our second year running the Concept Grant program and, once again, we were overwhelmed by the number, variety and strength of the applications. We were particularly impressed by Text Wash and selected it as the winner based on the importance and prevalence of the challenge it addresses, and its potential for wide-ranging impact.

Out of 47 applications received this year, 31% were either led-by or included women in their teams – up from 21% in 2018. As part of our commitment to encouraging diversity within computational social science, we would like to encourage more applications from women and diverse applicants in 2020.”

The Concept Grant program is a key part of the SAGE Ocean initiative to enable social scientists to work with big data and new technology. The grants support product innovation within social research, funding early stage software ideas that will help social researchers to engage with new computational methods and analyse data at scale.

Update from the 2018 Concept Grant winners

In 2018, SAGE Ocean awarded Concept Grants to support the development of three new research tools for social scientists. SAGE recently spoke with the winners to get an update on how the projects are progressing. Find out more and read the full interviews at the links below:

  • Quanteda Studio, LSE, UK – Read the interview

    • A powerful, flexible, and user-friendly text analytic software tool that will require no programming experience to use and will run as a web application.

  • MiniVAN, Public Data Lab, FR/UK – Read the interview

    • An easy-to-use tool that will support non-specialist social scientists in the visual analysis and in the online publication of networks.

  • Digital DNA Toolbox (DDNA), IIT-CNR, IT – Read the interview

    • A toolbox that will use bioinformatics techniques to provide researchers with a set of cutting-edge tools that can be used for many things, including assessing the veracity, trustworthiness, and reliability of content (and content producers) in online social networks and beyond.

SAGE Ocean will be awarding Concept Grants again in 2020. To stay up to date with the latest news and ensure you receive the next call for applications, subscribe to the Big Data Newsletter.

###

Sara Miller McCune founded SAGE Publishing in 1965 to support the dissemination of usable knowledge and educate a global community. SAGE publishes more than 1,000 journals and over 800 new books each year, spanning a wide range of subject areas. Our growing selection of library products includes archives, data, case studies and video. SAGE remains majority owned by our founder and after her lifetime will become owned by a charitable trust that secures the company’s continued independence. Principal offices are located in Los Angeles, London, New Delhi, Singapore, Washington DC and Melbourne. www.sagepublishing.com

SAGE Ocean is an initiative from SAGE Publishing to support social science by equipping social scientists with the skills, tools and resources they need to work with big data and new technology.

Contact (media inquiries only)