Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.

NLdoc

Logius

With NLdoc, you can easily create an accessible version of an inaccessible document. Even if you are not an expert.

Last change on 11th of June 2025, at 8:51 (CET) | Publication Standard 1.0

Publication category: Other algorithms
Impact assessment: Field not filled in.
Status: In use

Theme

Organisation and business operations

Begin date

2025-03

Contact information

nldoc@logius.nl

Link to publication website

https://NLdoc.nl

Link to source registration

https://gitlab.com/logius/nldoc

Goal and impact

With NLdoc, you easily convert any document into an accessible version. Usable by everyone and on all devices. That way, you don't exclude anyone. Moreover, your documents then comply with the law for digital accessibility.

Considerations

Almost all government organisations publish documents in the form of mostly PDF documents on their websites. These documents can only be accessed with specialist software, and specific knowledge. As a result, all these organisations do not comply with legal requirements. NLdoc offers functionality that allows you to publish an accessible alternative alongside existing documents. There are no affordable alternatives available and if every organisation has to solve this itself, it would cost exponentially more money.

Human intervention

NLdoc automatically converts your inaccessible documents to HTML. Sometimes some human insight is still needed to make content fully accessible. In the NLdoc application, you can easily take that last step. You do not need any technical knowledge - our user interface shows you the way. So that your document meets all WCAG 2.1 requirements.

Risk management

In order to determine what the NLdoc team needs to work on, it is important to understand the usage of our systems. With this data, we can continuously improve our service. For example, we discover which accessibility errors are common and can develop automatic solutions for them. We naturally ensure that we collect this data responsibly.

Data

When you upload a document to NLdoc, we do not store that source document. We process the content and transform that content into our structure. That produces an accessible HTML file that you can download or which is processed in your own environment via the API.

Technical design

With Tesseract, we read text from pages of documents. As best as it can, the model is going to tell us which words can be found where on the page.

The YOLO v11 model is trained on the DocLayNet dataset and helps us classify parts of pages. After classification, we can tell what kind of content is there from all kinds of parts of the page. Think of headings, tables, images, paragraphs, titles, etcetera. We can then apply these classifications to the found words, and then we know whether a word is part of a heading or a list, for example.

When the YOLO model has found a table, we use the Table transformer model to analyse it. This model is then going to tell us how a table is put together. So where are the rows, where are the columns, where are the table headers, etcetera. We can then use all the collected data to reconstruct the table again.