The Next 700 Data Description Languages (PDF)
An ad hoc data format is any non-standard data format. Typically, such formats do
not have parsing, querying, analysis, or transformation tools readily available. Every day,
network administrators, financial analysts, computer scientists, biologists, chemists, as-
tronomers, and physicists deal with ad hoc data in a myriad of complex formats. Figure 1
gives a partial sense of the range and pervasiveness of such data. Since off-the-shelf tools
for processing these ad hoc data formats do not exist or are not readily available, talented
scientists, data analysts, and programmers must waste their time on low-level chores like
parsing and format translation to extract the valuable information they need from their data.
[…]
The primary goal of this paper is to begin to understand the family of ad hoc data processing languages. We do so, as Landin did, by developing a semantic framework for defining, comparing, and contrasting languages in our domain. This semantic framework revolves around the definition of a data description calculus (DDCα). This calculus uses types from a dependent type theory to describe various forms of ad hoc data: base types to describe atomic pieces of data and type constructors to describe richer structures. We show how to give a denotational semantics to DDCα by interpreting types as parsing functions that map external representations (bits) to data structures in a typed lambda calculus. More precisely, these parsers produce both internal representations of the external data and parse descriptors that pinpoint errors in the original source.
Lire maintenant ?