Please note: The algorithm descriptions in English have been automatically translated. Errors may have been introduced in this process. For the original descriptions, go to the Dutch version of the Algorithm Register.
NLdoc
- Publication category
- Other algorithms
- Impact assessment
- Field not filled in.
- Status
- In use
General information
Theme
Begin date
Contact information
Link to publication website
Link to source registration
Responsible use
Goal and impact
With NLdoc, you easily convert any document into an accessible version. Usable by everyone and on all devices. That way, you don't exclude anyone. Moreover, your documents then comply with the law for digital accessibility.
Considerations
Almost all government organisations publish documents in the form of mostly PDF documents on their websites. These documents can only be accessed with specialist software, and specific knowledge. As a result, all these organisations do not comply with legal requirements. NLdoc offers functionality that allows you to publish an accessible alternative alongside existing documents. There are no affordable alternatives available and if every organisation has to solve this itself, it would cost exponentially more money.
Human intervention
NLdoc automatically converts your inaccessible documents to HTML. Sometimes some human insight is still needed to make content fully accessible. In the NLdoc application, you can easily take that last step. You do not need any technical knowledge - our user interface shows you the way. So that your document meets all WCAG 2.1 requirements.
Risk management
In order to determine what the NLdoc team needs to work on, it is important to understand the usage of our systems. With this data, we can continuously improve our service. For example, we discover which accessibility errors are common and can develop automatic solutions for them. We naturally ensure that we collect this data responsibly.
Operations
Data
When you upload a document to NLdoc, we do not store that source document. We process the content and transform that content into our structure. That produces an accessible HTML file that you can download or which is processed in your own environment via the API.
Technical design
With Tesseract, we read text from pages of documents. As best as it can, the model is going to tell us which words can be found where on the page.
The YOLO v11 model is trained on the DocLayNet dataset and helps us classify parts of pages. After classification, we can tell what kind of content is there from all kinds of parts of the page. Think of headings, tables, images, paragraphs, titles, etcetera. We can then apply these classifications to the found words, and then we know whether a word is part of a heading or a list, for example.
When the YOLO model has found a table, we use the Table transformer model to analyse it. This model is then going to tell us how a table is put together. So where are the rows, where are the columns, where are the table headers, etcetera. We can then use all the collected data to reconstruct the table again.