Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.
Blurring as a Service
- Publication category
- Impactful algorithms
- Impact assessment
- Field not filled in.
- Status
- In development
General information
Theme
Begin date
Contact information
Responsible use
Goal and impact
Blurring as a Service (BaaS) is a generic service that allows anonymising images from public space by removing people and license plates. The service can be used for different data sources and is first deployed for panoramic images.
Since 2016, panorama images of the entire city have been collected annually by the Basic Information Department (Mobile Mapping team) to visualise the City of Amsterdam. Among other things, the panorama images allow municipal employees to inspect public space from their workplaces for various purposes, such as accessibility with special vehicles or inspection of roads. It is similar to how Google Streetview works, although this should provide a more up-to-date view of the city.
For these tasks, it is not necessary to visualise or keep recognisable persons or license plates. It was therefore decided that these images should be anonymised. The anonymisation algorithm has been developed for this purpose. The deployment of the algorithm can be considered a measure to process the data carefully and lawfully. The developed facility 'Blurring as a Service' can prevent images with recognisable people and license plates from being used for municipal work processes.
The Computer Vision Team developed the facility that allows panoramic images to be anonymised. The algorithm is trained to recognise entire persons, not just faces, and license plates on (panorama) images. This ensures that these persons and license plates are then 'blurred', effectively anonymising them.
Considerations
It is conceivable that certain groups could be better or less well recognised by the algorithm. This could then result in different treatment or a distinction based on external characteristics, for example on the basis of gender, age, skin colour and related clothing and attributes (e.g. Disability, profession or hobby). The adverse consequence is that certain groups are more likely to be (temporarily) recognisably included in Amsterdam municipality systems. This should obviously be avoided.
The algorithm was trained on a balanced dataset, i.e., ensuring that as many groups as possible are represented. For example, images were selected around schools and also in neighbourhoods where more people with a non-Western migration background live in order to adequately represent children and people with different skin colours in the dataset.
There is an obligation on the municipality to investigate whether such a disparity exists. This investigation has now been carried out and the results can be accessed via: Link to public page.
Human intervention
There is no automated decision-making by using the algorithm. A citizen or entrepreneur will never receive a decision or order generated by this algorithm.
Users of the algorithm, after sending images to the Computer Vision Team, will receive back anonymised images. These images will be able to become part of a case, for example in the context of Supervision and Enforcement. The case handler can always judge for himself whether an image is sufficiently anonymised.
A number of processes are in place to counteract errors.
1. When processing a batch of images, a sample is taken and checked manually. This aims to verify that the algorithm does what is expected;
2. A feedback process is put in place so that errors can be corrected. In addition, these errors can also be used to improve the algorithm;
3. An annual evaluation takes place to determine whether the algorithm needs to be improved.
Risk management
Operations
Data
Architecture of the model:
The model is YOLOv5, a convolutional neural network that can be used for object detection. The network receives an image and predicts the locations of people and number plates on the images with so-called bounding boxes, aka squares. Subsequently, these areas are blurred by blurring the pixels.
Performance:
The anonymisation algorithm currently has an accuracy of roughly 95% for people close to the camera.
For number plates that are close to the camera, it anonymises about 97%.
The algorithm is tuned to prefer to anonymise slightly too much rather than too little. So it might find that a scooter, tree or other object is also blurred. We would rather blur a little too much than too little.
Through visual inspection on a sample, it was found that the persons who are not recognised are usually not recognisable because they are, for example, partially behind a tree. Ideally, of course, these persons would also be anonymised, unfortunately this is not yet possible.
Technical design
Processing basis:
The reason and need to develop an anonymisation algorithm is based on the municipality's task to keep the core register up to date and reliable. This is based on Article 2(2) of the Large-Scale Topography Key Register Act, and Article 2(1) of the Addresses and Buildings Key Register Act and Article 6(1)(e) of the AVG. In addition, the following is included in the relevant Regulation (Basic Information Regulations 2018:
A. The public task as source supplier for the core register of panoramic images (article 7);
B. The public task as source holder.
The process of anonymisation is a further processing of the acquired panorama images. That process can be considered compatible with the original purposes according to the WP29, provided the anonymisation process aims to reliably produce anonymised information and provided there is a basis for the primary use. See Article 29 Working Party, 'Opinion 5/2014 on anonymisation techniques', WP 216, p. 8:
Given the performance of the algorithm, it can be said to be a reliable way of anonymising information. This performance has now been independently tested by an external party, Verdonck, Klooster & Associates. The performance values have also been determined by the CIO. It can be concluded that this constitutes permissible compatible further processing.
Basis (primary processing purpose) for the acquisition of the panoramic images:
- Article 6(1)(e) AVG
- Article 2 paragraph 2 Large-Scale Topography (Basic Registration) Act
- Article 2 paragraph 1 Addresses and Buildings (Basic Registration) Act.
- Regulation (Basic Information Regulations 2018 included the following:
A. The public task as source supplier for the core register of panoramic images (Article 7);
B. The public task as source owner.
Basis (secondary processing purpose) for developing:
- Article 6 paragraph 1 sub c and e AVG
- Article 2 paragraph 2 Large-Scale Topography (Basic Registration) Act
- Article 2 paragraph 1 Addresses and Buildings (Basic Registration) Act.
- Regulation (Basic Information Regulations 2018) included the following:
C. The public task as source supplier for the core register of panoramic images (Article 7);
D.The public task as source owner.
- Art. 24 jo. 25 para. 1 and 2 jo. 32 AVG.
Personal data:
- Face and posture of people in public spaces;
- Face and posture of people who are in a residential object or office, or who are behind windows. Panoramic images are only taken outdoors.
- License plates;
- Company details, e.g. on company vehicles, signs or premises.
Training:
The images collected over the past few years have been used as training data to develop the algorithm. These are roughly 10,000 raw images. These images were needed to manually train the algorithm to learn to recognise people and license plates. The aim is for the algorithm to learn to be able to remove the persons and license plates on new images. This process is called 'annotation' and 'training'.
This training set for the algorithm is stored in the City of Amsterdam's Azure cloud environment. The training set is kept as long as the algorithm is in use to possibly make future improvements. The panorama images are stored in an encrypted environment and only officials who need the images can access them, e.g. the algorithm's developers.
Testing:
Some of the data is captured, kept separate to then test the algorithm with. So the training and testing data virtually overlap, although they are different images. Roughly 1,000 number of images were used for testing.
Operating:
The algorithm is currently being used for the following application:
- Acquiring panoramic images twice a year for the purpose of keeping core records current and reliable. This involves a large quantity of images;
- In the short term, the algorithm is expected to be used to anonymise images on which potentially illegally placed (heavy) containers have been spotted on vulnerable quay walls and bridges. This involves about 2,000 images per year.
The number of applications may increase in the future.