Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.

Draw Sample Companies

On this page, you will find information about the 'Sampling Enterprises' algorithm. This algorithm delineates the SME research population and randomly selects 3,600 entities from it that are eligible for a book examination by means of a stratified sample.

Last change on 4th of September 2025, at 18:43 (CET) | Publication Standard 1.0
Publication category
Impactful algorithms
Impact assessment
Field not filled in.
Status
In use

General information

Theme

Public finance

Begin date

Field not filled in.

Contact information

algoritmeregister@belastingdienst.nl

Link to publication website

https://over-ons.belastingdienst.nl/onderwerpen/omgaan-met-gegevens/algoritmeregister/

Link to source registration

https://over-ons.belastingdienst.nl/onderwerpen/omgaan-met-gegevens/algoritmeregister/trekken-steekproef-ondernemingen-so/

Responsible use

Goal and impact

The primary purpose of this work process is to use the sampling tool to gain insight into the accuracy and completeness of declarations. And to do so in a statistically sound manner. The outcome of this work process forms part of the input for the processes that lead to the Tax and Customs Administration's annual plan and the Executive Boards' annual plans.

The sample enterprises implements the Sampling Policy Framework and is conducted biannually.

The reasons for using an algorithm in SAS Enterprise Guide (instead of manually) are:

  • Efficiency: Large data sets are processed for population delineation.
  • Accuracy: The algorithm avoids human error in selecting units.
  • Reproducibility: The sample can be repeated exactly with the same settings.
  • Transparency: The method used is clearly documented and verifiable.

This algorithm delineates the survey population SME and draws a random stratified sample of 3,600 entities from it. These entities all receive a book audit. The audit task is the same for all entities.

The purpose of the algorithm is to randomly select 3600 entities from the SME population for an audit without specific cause.

Considerations

The sample is important for determining the Inland Revenue's annual plan. The algorithm enables the execution of the sample.

The algorithm systematically and accurately determines the study population. The selection made by the algorithm is documented and verifiable.

The alternative is for an employee to manually determine the study population and select entities at random. This would require looking at all entities. This is inefficient and error-prone

Human intervention

Human intervention in the Tax Administration context implies that a competent and knowledgeable employee plays a substantial role in decision-making.

Combination of human intervention and decisions by the algorithm.

Human intervention is involved in the operation of the algorithm, but decisions are also made by the algorithm.

The part of the algorithm that delineates the SME population is run in a controlled manner. That is, at each step it checks whether the outcome meets expectations.

The final implementation is the random drawing of a sample this is performed by the algorithm.

When the control assignments are entered into the system after the sample, an employee still performs a variety of manual checks in the relevant systems. This employee ultimately makes the decision whether the book examination can be carried out.

Risk management

  • Privacy and AVG

The use of data should be tested against the General Data Protection Regulation (AVG). Reviewing personal data reveals any privacy risks and allows appropriate measures to be taken.

The AVG prescribes that no more data should be used than necessary. This is called data minimisation. The Tax Administration regularly examines whether the data used are still necessary and may therefore be used.

  • Use of special personal data

No special personal data are processed in this algorithm.

  • Equality and non-discrimination

The algorithm is assessed in line with applicable non-discrimination principles for direct and indirect discrimination. Processing as little personal data as possible reduces the risk of direct discrimination. Employees involved in developing and managing the algorithms receive training on data protection and bias.

  • Safeguards

The General Administrative Law Act (Awb) requires government actions to be transparent and lawful. The Tax Administration observes the general principles of good governance when applying and developing algorithms.

The algorithm uses data collected under various tax laws. As required by the AVG, no more data is used than necessary.

Legal basis

  1. General State Tax Act:
  2. General Administrative Law Act:
  3. General Data Protection Regulation:
  4. General Data Protection Regulation Implementation Act:
  5. Payroll Tax Act 1964:
  6. Income Tax Act 2001:
  7. Corporation Tax Act 1969:
  8. Turnover Tax Act 1968:
  9. General Provisions Citizens' Service Number Act:
  10. Archives Act 1995:

Links to legal bases

  • General State Tax Act:: https://wetten.overheid.nl/BWBR0002320/
  • General Administrative Law Act:: https://wetten.overheid.nl/BWBR0005537/
  • General Data Protection Regulation:: https://eur-lex.europa.eu/legal-content/NL/TXT/HTML/?uri=CELEX:32016R0679
  • General Data Protection Regulation Implementation Act:: https://wetten.overheid.nl/BWBR0040940/
  • Payroll Tax Act 1964:: https://wetten.overheid.nl/BWBR0002471/
  • Income Tax Act 2001:: https://wetten.overheid.nl/BWBR0011353/
  • Corporation Tax Act 1969:: https://wetten.overheid.nl/BWBR0002672/
  • Turnover Tax Act 1968:: https://wetten.overheid.nl/BWBR0002629/
  • General provisions Citizen Service Number Act:: https://wetten.overheid.nl/BWBR0022428/
  • Archives Act 1995:: https://wetten.overheid.nl/BWBR0007376/

Operations

Data

  1. Personal and entity data taxpayer
  2. IH tax return details for determining IH tax liability and indication of profit declaration
  3. Turnover data for determining the liability to pay VAT and the tax interest for VAT
  4. Wage data for determining tax liability LH and tax interest LH
  5. Results of tax audits
  6. Indication HT
  7. Indication ANBI

Links to data sources

  • Personal and entity data taxpayer: Basisregistratie Personen (BRP), Belastingdienst
  • IH tax return data for determining IH tax liability and indication of profit declaration: Belastingdienst
  • Turnover data for determining OB liability and OB tax interest: Belastingdienst
  • Wage data for determining tax liability LH and tax interest LH: UWV
  • Results of audits: Belastingdienst
  • Indication HT: Belastingdienst
  • Indication ANBI: Data.overheid.nl

Technical design

The algorithm consists of selection rules drawn up by content experts based on expertise.

This identifies the survey population of SMEs. From this, 3,600 entities are selected at random and without specific cause to be eligible for an audit by means of a stratified sample.

The research population is all SME entities with relevant tax liability (combination of IH/VPB/OB,LH) and an active business. The SME population is skewed. The SME population ranges from small 1-person enterprises to large employers (up to 500 employees). The group of small 1-person enterprises is by far the largest. To ensure that there is sufficient diversity of firms in the sample, the sample is stratified. Based on age of the entity, employer status, level of turnover, level of LH tax interest and VPB tax liability, a division is made into 7 strata. Within each stratum, a random sample is drawn. The selected entities are all given a book audit. The audit assignment for all these entities is the same.

The algorithm is not self-learning. This means that the algorithm does not develop itself while using it.

External provider

The algorithm was developed by staff at the Inland Revenue and is also maintained internally.

Similar algorithm descriptions

  • Determining which residents are eligible to vote in elections. More specific information regarding this algorithm will follow.

    Last change on 8th of July 2025, at 13:25 (CET) | Publication Standard 1.0
    Publication category
    High-Risk AI-system
    Impact assessment
    Field not filled in.
    Status
    In development
  • The algorithm calculates which absence reports made by schools can be assigned to which compulsory attendance officer.

    Last change on 5th of July 2024, at 10:32 (CET) | Publication Standard 1.0
    Publication category
    Other algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • The algorithm calculates the educational outcomes of schools (cluster, branch, programme). The algorithm provides information that helps an inspector assess whether a school achieves the legal lower limit for learning outcomes to be achieved with these pupils.

    Last change on 9th of October 2024, at 7:35 (CET) | Publication Standard 1.0
    Publication category
    Impactful algorithms
    Impact assessment
    Field not filled in.
    Status
    In use
  • The algorithm calculates the probability of damage for an excavation notification. The algorithm provides information that helps an inspector assess whether an excavation notification is risky.

    Last change on 18th of June 2025, at 14:09 (CET) | Publication Standard 1.0
    Publication category
    Impactful algorithms
    Impact assessment
    DPIA, IAMA
    Status
    In use
  • The calculation method by which the decrease in value of a property can be calculated. This calculation method is used by the Institute Groningen Mining Damage (hereinafter: the IMG) in the Regulation on Decline in Value.

    Last change on 17th of September 2024, at 12:38 (CET) | Publication Standard 1.0
    Publication category
    Impactful algorithms
    Impact assessment
    DPIA
    Status
    In use