Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.
Algorithm data quality
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- Out of use
General information
Theme
Begin date
Contact information
Link to publication website
Responsible use
Goal and impact
The purpose of the algorithm is to identify patterns to detect possible (structural) errors in data in order to translate them into quality rules. It is intended to feed the source owner with new insights about data quality. It is not meant to independently correct data.
The effect is indirect for citizens. The effect is direct for Hague employees who are supported to improve the quality of service to citizens. The quality of registration improves and with it the quality of service.
Considerations
Q-bot brings focus to improving data quality. It helps data specialists identify and resolve errors or peculiarities in data in a targeted way.
Human intervention
Yes, this is the core of the application.
Risk management
The risks are limited because the algorithm does not make independent decisions. The bias is only technical in nature and has no impact on decision-making or citizens. No personal data are processed.
Operations
Data
Leasehold contracts, properties and land registry plots. These are linked.
Technical design
Self-built algorithm that uses various, existing machine learning algorithms. Various existing algorithms are applied to the data and results from them are collected. The in-house algorithm then selects which patterns are relevant and strong.
The data is structured into views by the technical user. Next, analysis preferences and settings are determined. Then the algorithm autonomously searches for significantly strong patterns in the data. These are collected in a list. These are then evaluated for quality and relevance. The most strong patterns are selected. Finally, the patterns are applied again to the original data and it determines which are normal (regular) and which are anomalous (irregular) for each field. These are then marked green or red.