Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.

Anonymise

The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).

Last change on 29th of April 2025, at 13:39 (CET) | Publication Standard 1.0
Publication category
Other algorithms
Impact assessment
DPIA
Status
In use

General information

Theme

Organisation and business operations

Begin date

2024-05

Contact information

info@vrtwente.nl

Link to publication website

https://vrtwente.nl/

Responsible use

Goal and impact

The anonymisation software is used to anonymise documents published by the Twente Safety Region faster and better. In this way, we prevent data leaks and contribute to better protection of the AVG rights of data subjects.

Considerations

Safety region Twente increasingly has to make information public. Therefore, privacy- or business-sensitive information needs to be varnished out. Before the algorithm was deployed, varnishing was done manually, which was time-consuming and carried a significant risk of incomplete anonymisation. The advantage of anonymisation software is that anonymisation is faster and better. The disadvantage is that the text layer of the document is analysed by a Microsoft Azure server. The content is not stored on this server, so the privacy risk of using the algorithm does not outweigh the privacy benefit of reducing the number of data breaches due to improper anonymisation.

Human intervention

The outcome of the algorithm is checked by an employee. The clerk is required by the software to check all pages. The clerk determines whether the document is correctly anonymised.

Risk management

There is no risk of automated decision-making and the algorithm has no impact on fundamental rights because the algorithm does not make decisions with legal consequences. It only suggests anonymising personal data. The algorithm is also used by the developer himself, so errors are quickly found. In addition, the algorithm is trained periodically. If the algorithm does not work well enough, we can make adjustments with black- and whitelists. The employee of Safety Region Twente always does the final check whether a document has been anonymised correctly. There is a risk that employees do not check properly; we mitigate this by paying attention to the importance of carefully checking the personal data found by the algorithm. The last remaining risk is the privacy risk of using Azure. Because Microsoft may be required to hand over data it processes to US authorities because of the Patriot Act. To mitigate these risks, the vendor has implemented privacy by default. Text sent by the API in synchronous or asynchronous calls to the Azure service may be temporarily stored by Azure for debugging. But the vendor has disabled this option. This limits the risk. Immediately after being processed by Azure, the data and data processing is deleted. Furthermore, the supplier is ISO 27001 certified. The risks do not outweigh the privacy benefits and the risk of poor anonymisation by not using this software.

Legal basis

1. WOO 2. WCO 3. UAVG 4. WEP 5. WDO

Links to legal bases

  • Woo: https://wetten.overheid.nl/BWBR0045754/
  • WDO: https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046
  • UAVG: https://wetten.overheid.nl/BWBR0040940
  • Wep: https://wetten.overheid.nl/BWBR0043961
  • Wdo: https://wetten.overheid.nl/BWBR0048156

Elaboration on impact assessments

Safety Region Twente used xxllnc's DPIA.

Impact assessment

Data Protection Impact Assessment (DPIA)

Operations

Data

What data is processed varies from one document to another. It often involves personal data such as e-mail addresses, names, phone numbers, bank account numbers, address details and signatures.

Technical design

Documents are uploaded to the application by an employee. At that point, a copy is made of the original in the form of a PDF with text layer and the metadata of the original document is removed from the copy. This copy ends up on a Dutch server and remains there for a maximum of 30 days. The text layer of the PDF is offered to the machine learning algorithm through an API. This is a Natural Language Processing algorithm (named entity recognition) from Microsoft Azure. The API returns at which location in the analysed texts a personal data is likely to occur, along with the probability score (a percentage). At that point, Azure immediately removes the text layer. The probability score is used along with vendor-developed proprietary ai models to make the recognition of personal data as accurate as possible. The models are trained using, among others, the following trained datasets as CoNLL-2003, UD Dutch LassySmall v2.8, Dutch NER Annotations for UD LassySmall and UD Dutch Alpino v2.8. Minimum key figures for the accuracy of identifying personal data are as follows: Named entities (precision): 0.78, Named entities (recall): 0.76, Named entities (F-score): 0.77. Finally, a staff member checks the document and when it completes the document, the data to be anonymised is permanently removed from the text layer and a black bar is placed.

External provider

xxllnc

Similar algorithm descriptions

  • The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).

    Last change on 12th of November 2024, at 7:25 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA, ...
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).

    Last change on 12th of May 2025, at 10:44 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA, ...
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).

    Last change on 19th of March 2025, at 10:12 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).

    Last change on 12th of February 2025, at 9:26 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA, ...
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).

    Last change on 5th of June 2025, at 9:29 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA
    Status
    In use