Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.
Anonymising documents
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
General information
Theme
Begin date
Contact information
Responsible use
Goal and impact
The aim is to anonymise privacy-sensitive information in documents published by the municipality.
In this way, we protect the privacy of citizens and organisations and prevent (possible) data leaks.
Considerations
The municipality wants to make information public. In doing so, privacy or business-sensitive information must be protected.
The advantage of anonymisation software is faster anonymisation. A disadvantage may be that too much reliance is placed on the outcome of the algorithm, by not checking as closely.
Human intervention
The outcome of the algorithm is checked by an employee. The clerk is required by the software to check all pages. The clerk determines whether the document is correctly anonymised.
Risk management
- The municipal employee always does the final check whether a document is correctly anonymised. There is a risk that employees do not check properly; we overcome this by paying attention to the importance of carefully checking the personal data found by the algorithm.
- Datamask is a SaaS (Software As A Service) solution. A copy of the document is uploaded without metadata to the supplier's environment for processing. Immediately after processing, the data and data processing is deleted. If the copy is not processed immediately, it is kept on the supplier's (Dutch) server for up to 30 days.
- The supplier is ISO 27001 certified.
Legal basis
Anonymisation is important because it helps protect the privacy of individuals and ensures that sensitive information is not inadvertently disclosed. The legal basis for anonymising data in the Netherlands is mainly laid down in the General Data Protection Regulation (AVG).
Links to legal bases
Link to Processing Index
Operations
Data
All information found in the uploaded documents (except metadata) is processed by the algorithm. This may include ordinary personal data, special personal data and criminal data. It may also include business-sensitive information.
Technical design
Documents are uploaded to the application. At that point, a copy is created in the form of a PDF with text layer and the metadata of the original document is removed from the copy. This copy arrives on the supplier's (Dutch) server and remains there for a maximum of 30 days. The text layer of the PDF is offered to the machine learning algorithm through an API.
This is a Natural Language Processing algorithm (named entity recognition) from Microsoft Azure. The API returns at which location in the analysed texts a personal data is likely to occur, along with the probability score (a percentage). The vendor uses the probability score along with proprietary AI models to make personal data recognition as accurate as possible.
Finally, an employee checks the document and when it completes the document, the data to be anonymised is permanently removed from the text layer and a black bar is placed.
External provider
Similar algorithm descriptions
- The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 31st of October 2024, at 15:08 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA, ...
- Status
- In development
- The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 31st of October 2024, at 9:40 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA, ...
- Status
- In use
- The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 12th of February 2025, at 9:26 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA, ...
- Status
- In use
- The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 12th of November 2024, at 7:25 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA, ...
- Status
- In use
- The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 27th of January 2025, at 10:18 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA, ...
- Status
- In use