Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.
Anonymisation software
- Publication category
- Other algorithms
- Impact assessment
- DPIA
- Status
- In use
General information
Theme
Begin date
Contact information
Link to publication website
Responsible use
Goal and impact
The anonymisation tool is used to ensure transparency while protecting individuals, companies and institutions.
The tool allows organisations to share information according to regulations such as the Woo, both actively and passively. This helps in protecting personal data of data subjects. The same goes for protecting privacy data of the organisation's own employees.
Applicants of a Woo request will receive the requested information, anonymised or partially masked according to other regulations. For the departments responsible for handling Woo requests, the tool makes complying with laws and regulations easier and shortens turnaround time so that information can be offered within the legal deadlines.
The risk impact of the algorithm is low for individuals and organisations. The algorithm searches for (personal) data and flags or masks it, without making automatic decisions. A task expert evaluates the proposals for anonymisation.
The tool also offers the option to manually mask information that cannot be disclosed, such as strategic information to protect the organisation or partners. The basis for anonymising or masking is provided by the tool.
Considerations
Sometimes certain excerpts of text in documents made public cannot be shared with the public. The Woo provides bases for this, such as the General Data Protection Regulation (AVG).
Without tools, anonymising texts would be time-consuming and increase the risk of errors, which could lead to unwanted publication of sensitive data. Using an anonymisation tool speeds up and simplifies this process for both active and passive disclosure.
Automated anonymisation is less error-prone than manual work. This reduces the risk of data leaks and better protects individuals' data.
Human intervention
Human intervention is always a requirement when using the software. This means that control is always carried out by co-workers. The organisation has prepared a setup document. This allows the organisation to adapt the use of algorithms to their specific situation. A subject matter expert evaluates proposals for anonymising texts. No automatic decisions are made.
The algorithm searches for (personal) data and marks them. The subject specialist checks and corrects the proposals. This work can possibly be checked by a second person within the tool. This fulfils the requirement of 'human intervention'.
Risk management
To avoid documents not being properly anonymised, a human check always takes place. The software is intuitive to use for checking, changes or additions. Without human verification, risks may arise, such as disclosure of privacy-sensitive data. The combination of the tool and human verification helps prevent this.
Violation of privacy laws:
Inadvertent disclosure of personal data may violate privacy laws, such as the EU's AVG. This can lead to significant fines and legal penalties.
Identity theft:
Disclosing personally identifiable information (PII) such as names, addresses and social security numbers can lead to identity theft and financial fraud.
Damage to reputation:
Both the reputation of the individuals whose information has been leaked and that of the organisation responsible for the leak can be seriously damaged.
Loss of trust:
The confidence of the public and affected stakeholders in the organisation may decrease, leading to a decline in engagement and support.
Personal damage:
Individuals may suffer emotional and psychological damage if their personal data, such as medical or financial information, is made public.
Exploitation and abuse:
Disclosed data can be used for malicious purposes, such as strike, harassment or discrimination.
Human monitoring helps to mitigate these risks by providing an extra layer of assessment and confirmation. A check is then made that anonymisation processes have been properly carried out before information is made public.
Legal basis
- General data protection regulation (AVG)
- Environment Act
- General Administrative Law Act (AWB)
- Disclosure Act
- Open Government Act (WOO)
- Electronic Publications Act (WEP)
Links to legal bases
- AVG: https://wetten.overheid.nl/BWBR0040940
- Omgevingswet: https://wetten.overheid.nl/BWBR0037885
- AWB: https://wetten.overheid.nl/BWBR0005537
- Bekendmakingswet: https://wetten.overheid.nl/BWBR0004287
- Wet Open Overheid (WOO): https://wetten.overheid.nl/BWBR0045754
- Wet Elektronische Publicaties (WEP): https://wetten.overheid.nl/BWBR0043961
Impact assessment
Operations
Data
All information found in the uploaded documents (except metadata) is processed by the algorithm. This may include ordinary personal data, special personal data and criminal data. It may also include business-sensitive information.
Technical design
Documents are uploaded to the application by an employee. At that point, a copy is made of the original in the form of a PDF with text layer and the metadata of the original document is removed from the copy. This copy ends up on a Dutch server and remains there for a maximum of 30 days. The text layer of the PDF is offered to the machine learning algorithm through an API. This is a Natural Language Processing algorithm (named entity recognition) from Microsoft Azure. The API returns at which location in the analysed texts a personal data is likely to occur, along with the probability score (a percentage). At that point, Azure immediately removes the text layer. The probability score is used along with vendor-developed proprietary ai models to make the recognition of personal data as accurate as possible. The models are trained using, among others, the following trained datasets as CoNLL-2003, UD Dutch LassySmall v2.8, Dutch NER Annotations for UD LassySmall and UD Dutch Alpino v2.8. Minimum key figures for the accuracy of identifying personal data are as follows: Named entities (precision): 0.78, Named entities (recall): 0.76, Named entities (F-score): 0.77.Finally, a staff member checks the document and when it completes the document, the data to be anonymised is permanently removed from the text layer and a black bar is placed.
External provider
Similar algorithm descriptions
- The algorithm in the software recognises and anonymises personal data and other sensitive information in documents. Governments regularly publish information related to the drafting and implementation of their policies (e.g. based on the Woo). This tool is used to render sensitive data unrecognisable in the process.Last change on 20th of November 2024, at 14:27 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA
- Status
- In use
- The algorithm recognises and anonymises personal data in documents.Last change on 10th of September 2024, at 12:06 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- DPIA
- Status
- In use
- Among other things, the algorithm recognises and anonymises (personal) data and confidential (financial) data in documents before they are published, e.g. on the basis of the Open Government Act.Last change on 4th of April 2024, at 12:15 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- Among other things, the algorithm identifies and anonymises (personal) data and confidential (financial) information in documents before they are published, as required by the Open Government Act.Last change on 12th of September 2024, at 8:23 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
- Among other things, the algorithm identifies and anonymises (personal) data and confidential (financial) information in documents before they are published, as required by the Open Government Act.Last change on 16th of May 2024, at 11:52 (CET) | Publication Standard 1.0
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use