Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.
Anonymisation tool
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
General information
Theme
Begin date
Contact information
Link to publication website
Responsible use
Goal and impact
The anonymisation tool is used to give substance to transparency on the one hand and the necessary protection of the persons to whom the documents relate on the other.
Considerations
This information may contain privacy-sensitive data. In doing so, it is important that this data is anonymised. Manual anonymisation is error-prone, time-consuming and can result in data leaks. The anonymisation tool enables users to anonymise personal data and confidential information efficiently and effectively.
Human intervention
The outcome of the algorithm is checked by an employee. The clerk is required by the software to check all pages. The clerk determines whether the document is correctly anonymised.
Risk management
The risk of using the anonymisation tool is negligible because the tool does not make any decisions. The anonymisation tool generates a proposal for anonymising data and information. The employee of Steenwijkerland municipality always performs the test of whether a document has been correctly anonymised.
Legal basis
WOO, AVG
Operations
Data
Personal data such as name, address, date of birth, gender, BSN, et cetera
Technical design
Documents are uploaded to the application by an employee. At that point, a copy is made of the original in the form of a PDF with text layer and the metadata of the original document is removed from the copy. This copy ends up on a Dutch server and remains there for a maximum of 30 days. The text layer of the PDF is offered to the machine learning algorithm through an API. This is a Natural Language Processing algorithm (named entity recognition) from Microsoft Azure. The API returns at which location in the analysed texts a personal data is likely to occur, along with the probability score (a percentage). At that point, Azure immediately removes the text layer. The probability score is used along with proprietary AI models developed by the vendor to make the recognition of personal data as accurate as possible. The models are trained using, among others, the following trained datasets as CoNLL-2003, UD Dutch LassySmall v2.8, Dutch NER Annotations for UD LassySmall and UD Dutch Alpino v2.8. Minimum key figures for the accuracy of identifying personal data are as follows: Named entities (precision): 0.78, Named entities (recall): 0.76, Named entities (F-score): 0.77. Finally, a staff member checks the document and when it completes the document, the data to be anonymised is permanently removed from the text layer and a black bar is placed.
External provider
Similar algorithm descriptions
- The algorithm underlines personal data in documents. An employee has to review all the pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 20th of November 2024, at 10:04 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 31st of March 2025, at 14:10 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 8th of January 2025, at 13:06 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA
- Status
- In use
- The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 31st of March 2025, at 14:04 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).Last change on 10th of April 2025, at 13:25 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA
- Status
- In use