Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.
WhatsApp De-duplicator
- Publication category
- Other algorithms
- Impact assessment
- DPIA
- Status
- In use
General information
Theme
Begin date
Contact information
Responsible use
Goal and impact
The basic principle of deduplication is to merge the same overlapping messages. In this way, all information from the different instant messages delivered is retained without duplicate messages. The purpose of this is to avoid publishing the same information from WhatsApp conversations more than once. By deduplicating the conversations, all data from different data holders can be made public at once, while only one review by lawyers is required. This saves time and prevents the same conversation from being accidentally assessed in multiple ways. This way, information is ready to be disclosed faster.
Considerations
By deploying this algorithm, WhatsApp conversations are published more completely. Gaps from one data holder can be filled with information from a second data holder. The time savings and quality of review make the algorithm deployment more effective and complete than reviewing all source files. Source files remain unchanged and available. The composite call is made public, the source files themselves are not.
Human intervention
A staff member checks that the correct files are merged and that the names of the participants are correct.
Risk management
The algorithm only deduplicates posts that match on content, time and author. As a result, no messages are deleted. In case of a mismatch on any of these factors, that message is not deduplicated.
A staff member checks that the correct files are merged and that the names of participants are correct. These errors can be missed, for example if the name of a participant of a conversation is misattributed. Chat messages are then attributed to someone who did not write this text. This represents a potential risk that using source files does not have.
Legal basis
The Open Government Act (Woo) regulates the right to information about everything the government does. It is the successor to the Open Government Act (Wob).
Links to legal bases
- Woo: https://wetten.overheid.nl/BWBR0045754
- Wob: https://wetten.overheid.nl/BWBR0005252
Impact assessment
Operations
Data
WhatsApp conversations with associated attachments
Technical design
The algorithm determines which chats belong together by looking at the content of messages within a chat. At a certain overlap, the algorithm groups the exports. An employee checks that this is correct and sets the names of the participants correctly. The algorithm can then deduplicate. Messages are deduplicated by content, time and author. Attachments are deduplicated based on the hash (a unique code for each file).