Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.
Anonymise
- Publication category
- Other algorithms
- Impact assessment
- DEDA
- Status
- In use
General information
Theme
- Organisation and business operations
- Education and Science
Begin date
Contact information
Link to publication website
Responsible use
Goal and impact
The anonymisation software helps to conceal names and faces in archive records. Access to these archive records is restricted in some cases under the Archives Act, and some records may fall within the scope of the GDPR. In order to make these (restricted) public records available nonetheless, personal data has been redacted.
Considerations
The Zeeuws Archief must comply with the provisions of the GDPR and the Archives Act when publishing information. To check whether the information has been properly anonymised, the Zeeuws Archief uses AQL audits. The Zeeuws Archief has collaborated with external partners to anonymise transcripts. However, this collaboration did not result in a successful AQL sample.
The algorithm developed recognises personal names, national insurance numbers and email addresses, and subsequently anonymises them. The process takes place on an internal server accessible only to authorised users. The algorithm has been designed to prioritise the detection of ‘false positives’. This limits the risk of failing to anonymise data. However, it does result in more ‘false negatives’ (too much information is redacted).
The algorithm’s output is checked against a sample before the document is published. If the sample does not meet the required standard, a manual correction is carried out.
The algorithm helps to reduce data breaches resulting from incorrect anonymisation. This has a positive effect on the protection of personal data, and the Zeeuws Archief maintains the highest AQL score.
Human intervention
A member of staff checks the results of the algorithm for each batch. They check whether the archive documents have been properly anonymised. If this is not the case, the entire batch is anonymised manually.
Risk management
There is no risk of automated decision-making. The algorithm does not affect fundamental rights, as it does not make decisions that have legal consequences. However, there is a risk that the output is not properly checked. We address this with an AQL audit. If the audit does not yield the highest score, the document will not be published. The risk of non-anonymised data leaving the Archive is reduced because we only use the algorithm internally. The benefits to privacy outweigh the risks of not using this software.
In addition, upon publication, it is stated that anonymisation is an automated process involving the use of AI. Should any errors be found, please contact the reading room.
Legal basis
1. The UAVG (General Data Protection Regulation Implementation Act) is the law that sets out rules on how we handle personal data.
2. The Archives Act deals with the retention of documents and data.
Links to legal bases
- UAVG: https://autoriteitpersoonsgegevens.nl/uploads/imported/verordening_2016_-_679_definitief.pdf
- Archives Act: https://wetten.overheid.nl/BWBR0007376/2024-06-19
Elaboration on impact assessments
Impact assessment
Operations
Data
Technical design
Face recognition is carried out using InsightFace. Recognised faces are rendered unrecognisable.
Personal data is identified using Named Entity Recognition (NER). For this purpose, three open-source models are used in succession. These models can be downloaded from HuggingFace.
Currently, these are:
- xlm-roberta-large-finetuned-conll03-english
- Davlan/bert-base-multilingual-cased-ner-hrl
- iiiorg/piiranha-v1-detect-personal-information
Recognised personal data (name, National Insurance number and email address) are blacked out in the image and replaced with the text ‘[Confidential]’ in the accompanying transcript.
The AQL sample has an inspection level of III and a sampling plan of 1.0
Similar algorithm descriptions
- Recognising and anonymising privacy-sensitive information in documents and other information sources.Last change on 23rd of October 2024, at 13:58 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- Recognising and anonymising privacy-sensitive information in documentsLast change on 12th of June 2024, at 6:53 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- Recognising and anonymising privacy-sensitive information in documentsLast change on 30th of May 2024, at 14:12 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- Recognise and anonymise privacy-sensitive information and documents.Last change on 31st of March 2026, at 13:15 (CET) | Publication Standard 1.0
- Publication category
- Impactful algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- Recognise and anonymise privacy-sensitive information and documents.Last change on 14th of October 2024, at 13:17 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA
- Status
- In use