Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.

Text analysis and document varnishing

Based on language technology, personal and company names are read and filtered out of text files such as emails and individual documents.

Last change on 28th of January 2025, at 10:58 (CET) | Publication Standard 1.0
Publication category
Other algorithms
Impact assessment
DPIA
Status
In use

General information

Theme

Organisation and business operations

Begin date

2019-07

Contact information

info@provinciegroningen.nl

Responsible use

Goal and impact

Support in the review process where legal protection applies to information that is disclosed. Protection from the AVG (persons) and Woo laws (especially company confidential), where grounds for exception are named.

Considerations

Manual review is intensive and error-prone. A suggestion list from the entity extraction algorithm brings all conceivable instances of individuals into the text.

Human intervention

Within the software, a list is built and offered to the user to select in the automatic varnishing process. The choice to adopt an advised term as a person name and not to disclose it is up to the user.

Risk management

There is no risk of automated decision-making and the algorithm has no impact on fundamental rights because the algorithm does not make decisions with legal consequences. It only makes a proposal for anonymising personal data. The employee of the administrative body always makes the final check whether a document has been correctly anonymised.

Legal basis

Legislation around public access to government data (Woo)

Links to legal bases

Wet Open Overheid: https://wetten.overheid.nl/BWBR0045754/2023-04-01#Hoofdstuk5

Impact assessment

Data Protection Impact Assessment (DPIA)

Operations

Data

This refers to documents and messaging information within the Province. Including email, files, Whatsapp messages and other media where administrative decision-making can be found.

Links to data sources

Algemene Office applicaties: Dit betreft standaard Office formaten inclusief email en social media formaten.

Technical design

Texts are recognised on the basis of Named Entity Recognition (NER) and a process within Insights extracts the names for further processing towards the management interface and the automatic lacquer rules.

External provider

ZyLAB eDiscovery & Compliance Services B.V.

Similar algorithm descriptions

  • Based on language technology, personal and company names are read and filtered out of text files such as emails and individual documents.

    Last change on 14th of October 2024, at 10:47 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • The algorithm underlines personal data in documents. An employee has to look at all pages and check whether the document is properly anonymised. Then the software removes all highlighted information and blacklists it. After that, the documents can be published, for example under the Open Government Act (WOO).

    Last change on 28th of January 2025, at 10:30 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA, ...
    Status
    In use
  • The algorithm recognises (personal) data and otherwise confidential information in a document and makes a proposal to anonymise it. A staff member evaluates the proposal and makes the final adjustment, making the document suitable for publication.

    Last change on 16th of August 2024, at 8:50 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    DPIA
    Status
    In use
  • The algorithm recognises (personal) data and otherwise confidential information in a document and makes a proposal to anonymise it. A staff member evaluates the proposal and makes the final adjustment, making the document suitable for publication.

    Last change on 7th of October 2024, at 15:33 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • The algorithm recognises (personal) data and otherwise confidential information in a document and makes a proposal to anonymise it. A staff member evaluates the proposal and makes the final adjustment, making the document suitable for publication.

    Last change on 25th of January 2024, at 12:17 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    Field not filled in.
    Status
    In use