Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.

Anonymisation

The algorithm flags personal data in documents. An employee then manually checks all pages to verify that the anonymisation is complete and correct. Following this check, the software removes the highlighted data, which is then redacted. The documents can then be published, for example under the Open Government Act (Woo).
Last change on 17th of June 2026, at 9:25 (CET) | Publication Standard 1.0
Publication category
Other algorithms
Impact assessment
DPIA
Status
In use

General information

Theme

Organisation and business operations

Begin date

2024-12

Contact information

info@harlingen.nl

Responsible use

Goal and impact

The anonymisation software is used to anonymise documents published by the local authority more quickly and effectively. In this way, we prevent data breaches and help to better protect the GDPR rights of data subjects.

Considerations

The local authority is increasingly required to publish information, whilst redacting data that is sensitive in terms of privacy or business confidentiality. Before the algorithm was introduced, this anonymisation was carried out manually, which was not always done correctly. This has led to data breaches, for example because personal data remained visible unintentionally or redacted information was still legible.

The algorithm assists with the anonymisation of documents, ensuring that this is done more quickly and consistently. When the algorithm is used, the text layer of documents is analysed via a Microsoft Azure server. The content of the documents is not stored during this process. This entails a limited privacy risk. On the other hand, the algorithm helps to reduce data breaches resulting from incorrect anonymisation, which, on balance, has a positive effect on the protection of personal data.

Human intervention

The algorithm’s output is checked by a member of staff. The software requires the staff member to check all pages. The staff member determines whether the document has been correctly anonymised.

Risk management

There is no risk of automated decision-making and the algorithm has no impact on fundamental rights, as it does not make decisions with legal consequences. It merely makes a proposal for the anonymisation of personal data. The algorithm is also used by the developer themselves, which means that errors are identified quickly. In addition, the algorithm is periodically trained. At our organisation’s request, our documents are not used to train the algorithm. If the algorithm does not perform well enough, we can make adjustments using blacklists and whitelists. A municipal employee always carries out the final check to ensure that a document has been correctly anonymised. There is a risk that staff may not carry out checks properly; we mitigate this by emphasising the importance of carefully checking the personal data identified by the algorithm. The final remaining risk is the privacy risk associated with the use of Azure. This is because Microsoft may be obliged to hand over data it processes to the US authorities under the Patriot Act. To limit these risks, the supplier has implemented ‘privacy by default’. Text sent by the API to the Azure service via synchronous or asynchronous calls may be temporarily stored by Azure for debugging purposes. However, the supplier has disabled this option, which reduces the risk. Immediately after processing by Azure, the data and the data processing records are deleted. Furthermore, the supplier is ISO 27001 certified. The risks do not outweigh the privacy benefits and the risk of inadequate anonymisation that would result from not using this software.

Legal basis

1. WOO 2. WDO 3. UAVG 4. WEP 5. WDO

Links to legal bases

  • Woo: https://wetten.overheid.nl/BWBR0045754/
  • WDO: https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:31995L0046
  • UAVG: https://wetten.overheid.nl/BWBR0040940
  • Wep: https://wetten.overheid.nl/BWBR0043961
  • Wdo: https://wetten.overheid.nl/BWBR0048156

Impact assessment

Data Protection Impact Assessment (DPIA)

Operations

Data

All information contained in the uploaded documents (with the exception of the metadata) is processed by the algorithm. This may include ordinary personal data, special categories of personal data and criminal records. It may also include commercially sensitive information.

Technical design

Documents are uploaded to the application by a member of staff. At that point, a copy of the original is created in the form of a PDF with a text layer, and the metadata from the original document is removed from the copy. This copy is stored on a Dutch server, where it remains for a maximum of 30 days. The text layer of the PDF is fed to the machine learning algorithm via an API. This is a Natural Language Processing algorithm (named entity recognition) from Microsoft Azure. The API returns the likely location within the analysed text where personal data is likely to occur, together with a probability score (a percentage). At that point, the text layer is immediately deleted from Azure. The probability score is used in conjunction with the supplier’s own AI models to ensure that the recognition of personal data is as accurate as possible. The models are trained using, amongst others, the following training datasets: CoNLL-2003, UD Dutch LassySmall v2.8, Dutch NER Annotations for UD LassySmall and UD Dutch Alpino v2.8. The minimum performance metrics for the accuracy of identifying personal data are as follows: Named entities (precision): 0.78, Named entities (recall): 0.76, Named entities (F-score): 0.77. Finally, a member of staff checks the document and, once they have finalised it, the data to be anonymised is permanently removed from the text layer and a black bar is inserted.

External provider

Xxllnc

Similar algorithm descriptions

  • The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).
    Last change on 10th of April 2025, at 13:25 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).
    Last change on 30th of October 2025, at 9:49 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).
    Last change on 3rd of February 2026, at 8:12 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).
    Last change on 8th of January 2025, at 13:06 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to review all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).
    Last change on 27th of May 2026, at 7:22 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DEDA, DPIA
    Status
    Out of use