Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.

Algorithm data quality

Municipality of The Hague

Quality-Bot (Q-Bot) independently discovers existing and new patterns in the data. We use these as Quick Scan to estimate data quality and detect possible errors. Then, good patterns can be adopted as data quality rules to continuously monitor data quality.

Last change on 23rd of August 2024, at 15:42 (CET) | Publication Standard 1.0

Publication category: Other algorithms
Impact assessment: Field not filled in.
Status: Out of use

Theme

Organisation and business operations

Begin date

Field not filled in.

Contact information

datashop@denhaag.nl

Link to publication website

n.v.t., proprietary code

Goal and impact

The purpose of the algorithm is to identify patterns to detect possible (structural) errors in data in order to translate them into quality rules. It is intended to feed the source owner with new insights about data quality. It is not meant to independently correct data.

The effect is indirect for citizens. The effect is direct for Hague employees who are supported to improve the quality of service to citizens. The quality of registration improves and with it the quality of service.

Considerations

Q-bot brings focus to improving data quality. It helps data specialists identify and resolve errors or peculiarities in data in a targeted way.

Human intervention

Yes, this is the core of the application.

Risk management

The risks are limited because the algorithm does not make independent decisions. The bias is only technical in nature and has no impact on decision-making or citizens. No personal data are processed.

Data

Leasehold contracts, properties and land registry plots. These are linked.

Technical design

Self-built algorithm that uses various, existing machine learning algorithms. Various existing algorithms are applied to the data and results from them are collected. The in-house algorithm then selects which patterns are relevant and strong.

The data is structured into views by the technical user. Next, analysis preferences and settings are determined. Then the algorithm autonomously searches for significantly strong patterns in the data. These are collected in a list. These are then evaluated for quality and relevance. The most strong patterns are selected. Finally, the patterns are applied again to the original data and it determines which are normal (regular) and which are anomalous (irregular) for each field. These are then marked green or red.