Standardised long-term archiving with PDF/A
Why an ISO standard from 2005 is still future-proof
Analogue archiving media such as microfilm and paper have long been a thing of the past. Even the image format TIFF, the first digital format, was replaced in the 1990s by a more efficient one, namely the Portable Document Format (PDF) published by Adobe Systems. Based on this, the archive format standard PDF/A was developed, which still plays a decisive role today for long-term archiving as required by law.
What became standard in 2005
In 2002, a working group for digital long-term archiving began its work at the International Organization for Standardization (ISO). Three years later, the ISO published the PDF/A format under the name “ISO 19005-1:2005” as the first file format standard for electronic long-term archiving, and this worldwide.
Why long-term archiving at all?
Business documents, such as invoices or contracts, must be archived for up to ten years – our legislation stipulates this. The most important retention periods for companies are set out in the German Fiscal Code (AO), the German Commercial Code (HGB) and the German Value Added Tax Act (UStG). In addition, there are also sector-specific retention obligations for documents in public administration, hospitals, the construction industry, etc.
For digital long-term archiving, the following criteria apply to file formats:
- Openly standardised (openly specified if necessary); no proprietary format
- widespread
- low complexity
- without access protection mechanisms such as copy protection or encryption
- self-documenting
- robust
- no dependencies on other file formats
- licence-free
- validatable
Companies must therefore think about how they will shop their business documents in the long term. In addition, the highest standards apply to data archiving, because the content must be presented in exactly the same way at all times and fulfil the archiving criteria.
What is behind PDF/A?
Everyone has probably opened or worked with a PDF document at some point. The common format is secure, easy to use and it also requires little storage space. PDFs look the same everywhere, regardless of which device or operating system is available. Another great advantage of a PDF file: Different file types can be embedded, such as fonts, graphics, 3D objects and also audio and video.
However, “normal” PDF files are not sufficient for long-term archiving because they can be edited afterwards. A PDF/A, on the other hand, ensures that the specifications are adhered to. For example, documents for long-term storage must not be encrypted with a password so that the contents can be accessed at any time. In addition, no video or audio files may be embedded, because everything that may require external software for display or playback must be deliberately omitted. JavaScript is also not permitted in documents for archiving.
After the PDF/A format was proclaimed the global archiving standard in 2005, certain sub-forms have subsequently developed. These are further developments.
The PDF/A-1 format with conformance levels a and b is based on PDF version 1.4 and was supplemented in 2011 by the ISO 19005-2 standard, which offers extended possibilities for archiving PDF documents. The PDF/A-2 format can be divided into three conformance levels: PDF/A-2a, PDF/A-2b and PDF/A-2u. PDF/A-3 has been available since 2012.
PDF/A-1 (2005) | PDF/A-2 (2011) | PDF/A-3 (2012) |
---|---|---|
based on PDF 1.4 | based on PDF 1.7 (ISO 32000-1) | |
ISO 19005-1 Standard | ISO 19005-2 Standard | ISO 19005-3 Standard |
Exact visual reproducibility + accessibility | Exact visual reproducibility + accessibility + extensions (supports JPEG 2000; extremely large page formats can be processed etc.) | Exact visual reproducibility + accessibility + extensions (even original Excel files and XML datasets can be embedded + electronic invoicing capability, ISO compliance etc.) |
Why different levels of conformity?
The different conformance levels a, b and u reflect the quality of the archived documents and are based on the input material and the intended use.
For PDF/A-1, a distinction is made between PDF/A-1a (level a) and PDF/A-1b (level b). The latter stands for clear visual long-term reproducibility. With PDF/A-1b, all inserted images must be firmly embedded in the document so that it functions completely autonomously. In addition, the text modules must have Unicode representation in order to be reproducible forever. PDF/A-1a – Level a offers more. Here, clear visual reproducibility, including the ability to reproduce text according to Unicode and structuring of the content of the document in terms of accessibility, is required.
While PDF/A-1b meets all the minimum requirements of the ISO standard, level a goes a step further and should be used, for example, if the document structure is also intended for display on mobile devices and the accessibility requirements must be met 100 %.
Archiving with PDF/A
The PDF/A-1 format is ideal for long-term archiving. Both PDF/A-1a and PDF/A-1b offer all the requirements for secure data transmission and archiving. It is guaranteed that the archived documents, including their attachments, can be read at any time (even after many years and regardless of the software used). And it is ensured that all documents are stored in a legally secure manner, i.e. in accordance with the prescribed retention periods. The documents remain readable at all times and their appearance is permanently preserved.
In addition to the original PDF/A-1 format, there are now the newer versions PDF/A-2 and PDF/A-3, which contain more features. Compared to the PDF/A-1 format, the PDF/A-2 format additionally offers the possibility to process JPEG 2000 as well as large page formats, which is especially interesting for libraries or large archives. PDF/A-2 also allows several files to be merged into a container PDF.
Where PDF/A is used for long-term archiving
The paperless office is far from being an everyday reality in all companies. For purely digital archiving, paper files and documents must be scanned in order to be digitised.
Incoming mail by e-mail including document attachments and other incoming mail by letter must be stored for ten years, just like other office documents (text documents, tables, presentations, etc.). Brochures or magazines that originate from layout programmes or editorial systems must also be converted to PDF/A for storage.
Since PDF/A-3, image files and complex CAD drawings can be embedded in a PDF/A file in their original format. Hybrid archiving of PDF document plus original file is therefore no longer necessary.
Conversion to PDF/A format
PDF/A has established itself as a widely used standard for long-term archiving. Many companies are faced with the challenge of making the path of their documents to PDF/A format as efficient as possible. After all, different source formats should be converted directly into PDF/A for archiving without much effort.
PDF/A has established itself as a widely used standard for long-term archiving. Many companies are faced with the challenge of making the path of their documents to PDF/A format as efficient as possible. After all, different source formats should be converted directly into PDF/A for archiving without much effort.
The most important advantages of webPDF for PDF/A conversion at a glance:
- webPDF converts documents from over 100 file formats directly into a PDF/A file and makes all necessary corrections and additions.
- All conformance levels (A Accessible, B Basic and U Unicode) are supported.
- The PDF engine checks compliance with the common standards PDF/A-1 (ISO 19005-1:2005), PDF/A-2 (ISO 19005-2:2011) and PDF/A-3 (ISO 19005-2:2012) on request.
- webPDF offers the possibility to output detailed reports in XML format
This is why PDF/A will remain up-to-date for a long time to come
Companies benefit from the ISO standard because it helps to keep digital documents compliant with legal requirements. PDF/A is the preferred long-term archiving format – and for good reasons. For a whole decade, experts have been developing PDF/A and its conformance levels. Experience shows that what has become a quasi-standard does not disappear from the market so quickly. Experts agree that PDF/A will remain a future-proof format that companies and public authorities can rely on. And the fact that Microsoft allows the direct creation of a PDF/A document from the Office palette is also an indication that PDF/A is more than just a flash in the pan.