Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.

Anonymise

Recognising people’s names and faces in texts and images, and then anonymising them.
Last change on 19th of June 2026, at 6:21 (CET) | Publication Standard 1.0
Publication category
Other algorithms
Impact assessment
DEDA
Status
In use

General information

Theme

  • Organisation and business operations
  • Education and Science

Begin date

2024-09

Contact information

ai-werkgroep@zeeuwsarchief.nl

Link to publication website

https://www.zeeuwsarchief.nl

Responsible use

Goal and impact

The anonymisation software helps to conceal names and faces in archive records. Access to these archive records is restricted in some cases under the Archives Act, and some records may fall within the scope of the GDPR. In order to make these (restricted) public records available nonetheless, personal data has been redacted.

Considerations

The Zeeuws Archief must comply with the provisions of the GDPR and the Archives Act when publishing information. To check whether the information has been properly anonymised, the Zeeuws Archief uses AQL audits. The Zeeuws Archief has collaborated with external partners to anonymise transcripts. However, this collaboration did not result in a successful AQL sample.

The algorithm developed recognises personal names, national insurance numbers and email addresses, and subsequently anonymises them. The process takes place on an internal server accessible only to authorised users. The algorithm has been designed to prioritise the detection of ‘false positives’. This limits the risk of failing to anonymise data. However, it does result in more ‘false negatives’ (too much information is redacted).

The algorithm’s output is checked against a sample before the document is published. If the sample does not meet the required standard, a manual correction is carried out.

The algorithm helps to reduce data breaches resulting from incorrect anonymisation. This has a positive effect on the protection of personal data, and the Zeeuws Archief maintains the highest AQL score.

Human intervention

A member of staff checks the results of the algorithm for each batch. They check whether the archive documents have been properly anonymised. If this is not the case, the entire batch is anonymised manually.

Risk management

There is no risk of automated decision-making. The algorithm does not affect fundamental rights, as it does not make decisions that have legal consequences. However, there is a risk that the output is not properly checked. We address this with an AQL audit. If the audit does not yield the highest score, the document will not be published. The risk of non-anonymised data leaving the Archive is reduced because we only use the algorithm internally. The benefits to privacy outweigh the risks of not using this software.

In addition, upon publication, it is stated that anonymisation is an automated process involving the use of AI. Should any errors be found, please contact the reading room.

Legal basis

1. The UAVG (General Data Protection Regulation Implementation Act) is the law that sets out rules on how we handle personal data.

2. The Archives Act deals with the retention of documents and data.

Links to legal bases

  • UAVG: https://autoriteitpersoonsgegevens.nl/uploads/imported/verordening_2016_-_679_definitief.pdf
  • Archives Act: https://wetten.overheid.nl/BWBR0007376/2024-06-19

Elaboration on impact assessments


Impact assessment

The Ethical Data Assistant (DEDA)

Operations

Data

The system processes text that has been converted into written words. This may also contain personal information.

Technical design

Face recognition is carried out using InsightFace. Recognised faces are rendered unrecognisable.

Personal data is identified using Named Entity Recognition (NER). For this purpose, three open-source models are used in succession. These models can be downloaded from HuggingFace.

Currently, these are:

- xlm-roberta-large-finetuned-conll03-english

- Davlan/bert-base-multilingual-cased-ner-hrl

- iiiorg/piiranha-v1-detect-personal-information


Recognised personal data (name, National Insurance number and email address) are blacked out in the image and replaced with the text ‘[Confidential]’ in the accompanying transcript.


The AQL sample has an inspection level of III and a sampling plan of 1.0

Similar algorithm descriptions

  • Recognising and anonymising privacy-sensitive information in documents and other information sources.
    Last change on 23rd of October 2024, at 13:58 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • Recognising and anonymising privacy-sensitive information in documents
    Last change on 12th of June 2024, at 6:53 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • Recognising and anonymising privacy-sensitive information in documents
    Last change on 30th of May 2024, at 14:12 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • Recognise and anonymise privacy-sensitive information and documents.
    Last change on 31st of March 2026, at 13:15 (CET) | Publication Standard 1.0
    Publication category
    Impactful algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • Recognise and anonymise privacy-sensitive information and documents.
    Last change on 14th of October 2024, at 13:17 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA
    Status
    In use