Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.

WhatsApp De-duplicator

The WhatsApp Deduplicator deduplicates WhatsApp exports from one or more data holders of the same conversation. This creates a single document containing all messages between the participants of the conversation.

Last change on 29th of November 2024, at 10:58 (CET) | Publication Standard 1.0
Publication category
Other algorithms
Impact assessment
DPIA
Status
In use

General information

Theme

Organisation and business operations

Begin date

05-2023

Contact information

data-science-nc19@minvws.nl

Responsible use

Goal and impact

The basic principle of deduplication is to merge the same overlapping messages. In this way, all information from the different instant messages delivered is retained without duplicate messages. The purpose of this is to avoid publishing the same information from WhatsApp conversations more than once. By deduplicating the conversations, all data from different data holders can be made public at once, while only one review by lawyers is required. This saves time and prevents the same conversation from being accidentally assessed in multiple ways. This way, information is ready to be disclosed faster.

Considerations

By deploying this algorithm, WhatsApp conversations are published more completely. Gaps from one data holder can be filled with information from a second data holder. The time savings and quality of review make the algorithm deployment more effective and complete than reviewing all source files. Source files remain unchanged and available. The composite call is made public, the source files themselves are not.

Human intervention

A staff member checks that the correct files are merged and that the names of the participants are correct.

Risk management

The algorithm only deduplicates posts that match on content, time and author. As a result, no messages are deleted. In case of a mismatch on any of these factors, that message is not deduplicated.

A staff member checks that the correct files are merged and that the names of participants are correct. These errors can be missed, for example if the name of a participant of a conversation is misattributed. Chat messages are then attributed to someone who did not write this text. This represents a potential risk that using source files does not have.

Legal basis

The Open Government Act (Woo) regulates the right to information about everything the government does. It is the successor to the Open Government Act (Wob).

Links to legal bases

  • Woo: https://wetten.overheid.nl/BWBR0045754
  • Wob: https://wetten.overheid.nl/BWBR0005252

Impact assessment

Data Protection Impact Assessment (DPIA)

Operations

Data

WhatsApp conversations with associated attachments

Technical design

The algorithm determines which chats belong together by looking at the content of messages within a chat. At a certain overlap, the algorithm groups the exports. An employee checks that this is correct and sets the names of the participants correctly. The algorithm can then deduplicate. Messages are deduplicated by content, time and author. Attachments are deduplicated based on the hash (a unique code for each file).

External provider

Internally developed