Alexandria - Derniers papiers http://alexandria.fr/ A History of Haskell : Being Lazy With Class 2011-10-16T19:51:47+02:00 tag:alexandria.fr,2010:paper/30 Paul Hudak, John Hugues, Simon Peyton Jones, Philip Wadler This paper describes the history of Haskell, including its genesis and principles, technical contributions, implementations and tools, and applications and impact. Allocation Removal by Partial Evaluation in a Tracing JIT 2011-10-09T19:35:16+02:00 tag:alexandria.fr,2010:paper/29 Carl Friedrich Bolz, Antonio Cunia, Maciej Fijałkowski, Michael Leuschel, Samuele Pedroni, Armin Rigo The performance of many dynamic language implementations suffers from high allocation rates and runtime type checks. This makes dynamic languages less applicable to purely algorithmic problems, despite their growing popularity. In this paper we present a simple compiler optimization based on online partial evaluation to remove object allocations and runtime type checks in the context of a tracing JIT. We evaluate the optimization using a Python VM and find that it gives good results for all our (real-life) benchmarks. The Next 700 Data Description Languages (PDF) 2011-10-09T15:33:35+02:00 tag:alexandria.fr,2010:paper/28 Kathleen Fisher, Yitzhak Mandelbaum, David Walker <p>An ad hoc data format is any non-standard data format. Typically, such formats do not have parsing, querying, analysis, or transformation tools readily available. Every day, network administrators, financial analysts, computer scientists, biologists, chemists, as- tronomers, and physicists deal with ad hoc data in a myriad of complex formats. Figure 1 gives a partial sense of the range and pervasiveness of such data. Since off-the-shelf tools for processing these ad hoc data formats do not exist or are not readily available, talented scientists, data analysts, and programmers must waste their time on low-level chores like parsing and format translation to extract the valuable information they need from their data. <br /> [&#x2026;] </p><p>The primary goal of this paper is to begin to understand the family of ad hoc data processing languages. We do so, as Landin did, by developing a semantic framework for defining, comparing, and contrasting languages in our domain. This semantic framework revolves around the definition of a data description calculus (DDC<sup>α).</sup> This calculus uses types from a dependent type theory to describe various forms of ad hoc data: base types to describe atomic pieces of data and type constructors to describe richer structures. We show how to give a denotational semantics to DDC<sup>α</sup> by interpreting types as parsing functions that map external representations (bits) to data structures in a typed lambda calculus. More precisely, these parsers produce both internal representations of the external data and parse descriptors that pinpoint errors in the original source.</p> Exceptional syntax 2011-10-08T15:04:03+02:00 tag:alexandria.fr,2010:paper/27 Nick Benton, Andrew Kennedy From the points of view of programming pragmatics, rewriting and operational semantics, the syntactic construct used for exception handling in ML-like programming languages, and in much theoretical work on exceptions, has subtly undesirable features. We propose and discuss a more well-behaved construct. Optimization of dynamic languages using hierarchical layering of virtual machines 2011-09-25T16:05:01+02:00 tag:alexandria.fr,2010:paper/26 Alexander Yermolovich, Christian Wimmer, Michael Franz <p>Creating an interpreter is a simple and fast way to implement a dynamic programming language. With this ease also come major drawbacks. Interpreters are significantly slower than compiled machine code because they have a high dispatch overhead and cannot perform optimizations. To overcome these limitations, interpreters are commonly combined with just-in-time compilers to improve the overall performance. However, this means that a just-in-time compiler has to be implemented for each language. </p><p>We explore the approach of taking an interpreter of a dynamic language and running it on top of an optimizing trace-based virtual machine, i.e., we run a guest VM on top of a host VM. The host VM uses trace recording to observe the guest VM executing the application program. Each recorded trace represents a sequence of guest VM bytecodes corresponding to a given execution path through the application program. The host VM optimizes and compiles these traces to machine code, thus eliminating the need for a custom just-in-time compiler for the guest VM. The guest VM only needs to provide basic information about its interpreter loop to the host VM.</p> Resource-safe Systems Programming with Embedded Domain Specific Languages 2011-09-16T17:59:44+02:00 tag:alexandria.fr,2010:paper/25 Edwin Brady, Kevin Hammond We introduce a new overloading notation that facilitates programming, modularity and reuse in Embedded Domain Specific Languages (EDSLs), and use it to reason about safe resource usage and state management. We separate the structural language constructs from our primitive operations, and show how precisely-typed functions can be lifted into the EDSL. In this way, we implement a generic framework for constructing state-aware EDSLs for systems programming. Computational Geometry 2011-09-12T10:40:12+02:00 tag:alexandria.fr,2010:paper/24 Michael Shamos Cette thèse de 1978 a fait naître un domaine&#xA0;: la géométrie algorithmique. L&#x2019;auteur a étudié attentivement la géométrie et l&#x2019;algorithmique, et sa thèse est remplie de questions d&#x2019;entretiens<code>^W^W</code> méthodes encore utilisées aujourd&#x2019;hui pour un bon nombre de problème géométriques&#xA0;: <ul><li>test de convexité&#xA0;; </li><li>enveloppe convexe&#xA0;; </li><li>diamètre d&#x2019;un polygone&#xA0;; </li><li>intersections&#xA0;; </li><li>plus proches voisins&#xA0;; </li><li>diagrammes de Voronoï. </li></ul><p>Pour ne citer qu&#x2019;un domaine d&#x2019;application, le jeu vidéo utilise énormément les intersections et les plus proches voisins. </p><p>L&#x2019;algorithme utilisé pour déterminer le diamètre d&#x2019;un polygone (la distance maximum entre deux points) est assez élégant et mérite d&#x2019;être connu. Il s&#x2019;exécute en O(n) une fois qu&#x2019;on a l&#x2019;enveloppe convexe (qui peut aussi se faire en O(n) suivant les préconditions). On l&#x2019;appelle l&#x2019;algorithme des &quot;rotating calipers&quot; (littéralement pieds à coulisse rotatifs) et il a ensuite été généralisé à une <a class="extern" href="http://cgm.cs.mcgill.ca/~orm/rotcal.html">myriade d'autres problèmes</a><code>.</code></p> How efficient is speech? 2011-09-25T18:08:51+02:00 tag:alexandria.fr,2010:paper/23 R.J.J.H. van Son, Louis C.W. Pols Speech is considered an efficient communication channel. This implies that the organization of utterances is such that more speaking effort is directed towards important parts than towards redundant parts. Based on a model of incremental word recognition, the importance of a segment is defined as its contribution to word-disambiguation. This importance is measured as the segmental information content, in bits. On a labeled Dutch speech corpus it is then shown that crucial aspects of the information structure of utterances partition the segmental information content and explain 90% of the variance. Two measures of acoustical reduction, duration and spectral center of gravity, are correlated with the segmental information content in such a way that more important phonemes are less reduced. It is concluded that the organization of reduction according to conventional information structure does indeed increase efficiency. Relaxing the Value Restriction 2011-08-31T13:41:16+02:00 tag:alexandria.fr,2010:paper/22 Jacques Garrigue Restricting polymorphism to values is now the standard way to obtain soundness in ML-like programming languages with imperative features. While this solution has undeniable advantages over previous approaches, it forbids polymorphism in many cases where it would be sound. We use a subtyping based approach to recover part of this lost polymorphism, without changing the type algebra itself, and this has significant applications. Formal Proof -- The Four-Colors Theorem 2011-08-29T13:40:37+02:00 tag:alexandria.fr,2010:paper/21 Georges Gonthier <p>In 2005, we achieved the formalization of the <em>entire</em> proof of the Four-Color Theorem, not just its code. While we tackled this project mainly to explore the capabilities of a modern formal proof system —at first, to benchmark speed— we were pleasantly surprised to uncover new and rather elegant nuggets of mathematics in the process. In hindsight this might have been expected: to produce a formal proof one must make explicit every single logical step of a proof; <a id="..."></a> Perhaps this is the most promising aspect of formal proof: it is not merely a method to make absolutely sure we have not made a mistake in a proof, but also a tool that shows us and compels us to understand why a proof works. </p><p>In this article, the next two sections contain background material, describing the original proof and the Coq formal system we used. The following two sections describe the sometimes new mathematics involved in the formalization. Then the next two sections go into some detail into the two main parts of the formal proof: reducibility and unavoidability. The Coq code (available at the same address) is the ultimate reference for the intrepid, who should bone up on Coq beforehand.</p> Yield : Mainstream Delimited Continuations 2011-08-26T09:34:56+02:00 tag:alexandria.fr,2010:paper/20 Roshan P. James, Amr Sabry Many mainstream languages have operators named yield that share common semantic roots but differ significantly in their details. We present the first known formal study of these mainstream yield operators, unify their many semantic differences, adapt them to to a functional setting, and distill the operational semantics and type theory for a generalized yield operator. The resultant yield, with its delimiter run, turns out to be a delimited control operator of comparable expressive power to shift-reset, with translations from one to the other. The mainstream variants of yield turn out to be one-shot or linearly used restrictions of delimited continuations. These connections may serve as a means of transporting ideas from the rich theory on delimited continuations to mainstream languages which have largely shied away from them. Dually, the restrictions of the various existing yield operations may be treated as shift-reset variants that have found mainstream acceptance and thus worthy of study. Using uh and um in spontaneous speaking 2011-08-25T12:07:34+02:00 tag:alexandria.fr,2010:paper/19 Herber H. Clark, Jean E. Fox Tree The proposal examined here is that speakers use uh and um to announce that they are initiating what they expect to be a minor (uh), or major (um), delay in speaking. Speakers can use these announcements in turn to implicate, for example, that they are searching for a word, are deciding what to say next, want to keep the floor, or want to cede the floor. Evidence for the proposal comes from several large corpora of spontaneous speech. The evidence shows that speakers monitor their speech plans for upcoming delays worthy of comment. When they discover such a delay, they formulate where and how to suspend speaking, which item to produce (uh or um), whether to attach it as a clitic onto the previous word (as in “and-uh”), and whether to prolong it. The argument is that uh and um are conventional English words, and speakers plan for, formulate, and produce them just as they would any word. A simulation of segregation in cities and its application for the analysis of price regulation 2011-07-06T11:52:22+02:00 tag:alexandria.fr,2010:paper/18 Wolfgang Wagner Social segregation in cities takes place where different household groups exist and when, according to Schelling, their location choice either minimizes the number of differing households in their neighbourhood or maximizes their own group. In this contribution an evolutionary simulation based on a monocentric city model with externalities among households is used to discuss the spatial segregation patterns of four groups. The resulting complex spatial patterns can be shown as graphic animations. They can be applied as initial situation for the analysis of the effects a price regulation has on segregation. JEL classification: D62, R14, R31, R52 Keywords: simulation, segregation, monocentric city, price regulation Depixelizing Pixel Art 2011-07-06T11:38:00+02:00 tag:alexandria.fr,2010:paper/17 Johannes Kopf, Dani Lischinski We describe a novel algorithm for extracting a resolution-independent vector representation from pixel art images, which enables magnifying the results by an arbitrary amount without image degradation. Our algorithm resolves pixel-scale features in the input and converts them into regions with smoothly varying shading that are crisply separated by piecewise-smooth contour curves. In the original image, pixels are represented on a square pixel lattice, where diagonal neighbors are only connected through a single point. This causes thin features to become visually disconnected under magnification by conventional means, and it causes connectedness and separation of diagonal neighbors to be ambiguous. The key to our algorithm is in resolving these ambiguities. This enables us to reshape the pixel cells so that neighboring pixels belonging to the same feature are connected through edges, thereby preserving the feature connectivity under magnification. We reduce pixel aliasing artifacts and improve smoothness by fitting spline curves to contours in the image and optimizing their control points. Do we Need Dependent Types? 2011-06-21T11:26:35+02:00 tag:alexandria.fr,2010:paper/16 Daniel Fridlender, Mia Indrika Inspired by Functional Unparsing (Danvy 1998), we describe a technique for dening, within the Hindley-Milner type system, some functions which seem to require a language with dependent types. We illustrate this by giving a general denition of zipWith for which the Haskell library provides a family of functions, each member of the family having a dierent type and arity. Our technique consists in introducing ad hoc codings for natural numbers which resemble numerals in λ-calculus. Specification and Verification: The Spec# Experience 2011-06-03T11:39:10+02:00 tag:alexandria.fr,2010:paper/15 Mike Barnett, Manuel Fähnfrich, K. Rustan M. Leino, Peter Müller, Wolfram Schulte, Herman Vente Spec# is a programming system that facilitates the development of correct software. The Spec# language extends C# with contracts that allow programmers to express their design intent in the code. The Spec# tool suite consists of a compiler that emits run-time checks for contracts, a static program verifier that attempts to mathematically prove the correctness of programs, and an integration into the Visual Studio development environment. Spec# shows how contracts and verifiers can be integrated seamlessly into the software development process. This paper reflects on the six-year history of the Spec# project, scientific contributions it has made, remaining challenges for tools that seek to establish program correctness, and prospects of incorporating verification into everyday software engineering. A Tail-Recursive Machine with Stack Inspection 2011-06-07T15:02:44+02:00 tag:alexandria.fr,2010:paper/14 John Clements, Matthias Felleisen Security folklore holds that a security mechanism based on stack inspection is incompatible with a global tail call optimization policy; that an implementation of such a language must allocate memory for a source-code tail call, and a program that uses only tail calls (and no other memory-allocating construct) may nevertheless exhaust the available memory. In this article, we prove this widely held belief wrong. We exhibit an abstract machine for a language with security stack inspection whose space consumption function is equivalent to that of the canonical tail call optimizing abstract machine. Our machine is surprisingly simple and suggests that tail calls are as easy to implement in a security setting as they are in a conventional one. Foundations of Algebraic Specification and Formal Software Development 2011-05-30T22:23:05+02:00 tag:alexandria.fr,2010:paper/13 Donald Sannella, Andrzej Tarlecki Ce livre couvre la Théorie des Institutions (Goguen, Burstall), un ensemble d&#x2019;outils de l&#x2019;algèbre universelle pour décrire des systèmes logiques, définir des propriétés générales à leur sujet, ou exprimer des relations qui les lient, et appliquer tout ça aux spécifications algébriques. Suffisamment clair pour les non-spécialistes mais néanmoins très complet, il est illustré par de nombreux exemples détaillés, et propose des exercices permettant de se familiariser avec les notions étudiées. Le livre commence en outre par des chapitres d&#x2019;introduction présentant l&#x2019;intérêt de la discipline ainsi que son formalisme (premier chapitre), puis donne un exemple, sans utiliser les institutions, de système logique, ainsi que des propriétés intéressantes de celui-ci (deuxième chapitre) <abbr></abbr> Thorn: Robust, Concurrent, Extensible Scripting on the JVM 2011-05-30T10:22:41+02:00 tag:alexandria.fr,2010:paper/12 Bard Bloom, John Field, Nate Nystrom, Johan Östlund, Gregor Richards, Rok Strniša, Jan Vitek, Tobias Wrigstad Scripting languages enjoy great popularity due to their support for rapid and exploratory development. They typically have lightweight syntax, weak data privacy, dynamic typing, powerful aggregate data types, and allow execution of the completed parts of incomplete programs. The price of these features comes later in the software life cycle. Scripts are hard to evolve and compose, and often slow. An additional weakness of most scripting languages is lack of support for concurrency&#x2014;though concurrency is required for scalability and interacting with remote services. This paper reports on the design and implementation of Thorn, a novel programming language targeting the JVM. Our principal contributions are a careful selection of features that support the evolution of scripts into industrial grade programs&#x2014;e.g., an expressive module system, an optional type annotation facility for declarations, and support for concurrency based on message passing between lightweight, isolated processes. On the implementation side, Thorn has been designed to accommodate the evolution of the language itself through a compiler plugin mechanism and target the Java virtual machine. Kleisli Arrows of Outrageous Fortune 2011-06-02T14:37:25+02:00 tag:alexandria.fr,2010:paper/11 Conor McBride <p>When we program to interact with a turbulent world, we are to some extent at its mercy. To achieve safety, we must ensure that programs act in accordance with what is known about the state of the world, as determined dynamically. Is there any hope to enforce safety policies for dynamic interaction by static typing? This paper answers with a cautious ‘yes’. </p><p>Monads provide a type discipline for effectful programming, mapping value types to computation types. If we index our types by data approximating the ‘state of the world’, we refine our values to witnesses for some condition of the world. Ordinary monads for indexed types give a discipline for effectful programming contingent on state, modelling the whims of fortune in way that Atkey’s indexed monads for ordinary types do not (Atkey, 2009). Arrows in the corresponding Kleisli category represent computations which a reach a given postcondition from a given precondition: their types are just specifications in a Hoare logic! </p><p>By way of an elementary introduction to this approach, I present the example of a monad for interacting with a file handle which is either ‘open’ or ‘closed’, constructed from a command interface specfied Hoare-style. An attempt to open a file results in a state which is statically unpredictable but dynamically detectable. Well typed programs behave accordingly in either case. Haskell’s dependent type system, as exposed by the Strathclyde Haskell Enhancement preprocessor, provides a suitable basis for this simple experiment.</p> Part-of-Speech Tagging From 97% to 100%: Is It Time for Some Linguistics? 2011-05-31T15:12:07+02:00 tag:alexandria.fr,2010:paper/10 Christopher D. Manning I examine what would be necessary to move part-of-speech tagging performance from its current level of about 97.3% token accuracy (56% sentence accuracy) to close to 100% accuracy. I suggest that it must still be possible to greatly increase tagging performance and examine some useful improvements that have recently been made to the Stanford Part-of-Speech Tagger. However, an error analysis of some of the remaining errors suggests that there is limited further mileage to be had either from better machine learning or better features in a discriminative sequence classifier. The prospects for further gains from semi-supervised learning also seem quite limited. Rather, I suggest and begin to demonstrate that the largest opportunity for further progress comes from improving the taxonomic basis of the linguistic resources from which taggers are trained. That is, from improved descriptive linguistics. However, I conclude by suggesting that there are also limits to this process. The status of some words may not be able to be adequately captured by assigning them to one of a small number of categories. While conventions can be used in such cases to improve tagging consistency, they lack a strong linguistic basis. Categorical abstract data type 2011-05-28T08:49:14+02:00 tag:alexandria.fr,2010:paper/9 S.K.Lellahi Category theory is more and more used in studying abstract data types. Since long time, some authors used the notion of sketch to study the syntax and semantics of mathematical structures. This notion, introduced and developed by Ehresmann, is more powerful than the signature one, since it uses limits in categorical sense. Indeed, the signature approach uses nothing but products, while the notion of sketch allows not only to specify algebraic structures but also non algebraic ones such as fields. Applicative programming with Effects 2011-05-09T23:58:59+02:00 tag:alexandria.fr,2010:paper/8 Conor McBride, Ross Paterson In this paper, we introduce Applicative functors&#x2013;an abstract characterisation of an applicative style of effectful programming, weaker than Monads and hence more widespread. Indeed, it is the ubiquity of this programming pattern that drew us to the abstraction. We retrace our steps in this paper, introducing the applicative pattern by diverse examples, then abstracting it to define the Applicative type class and introducing a bracket notation which interprets the normal application syntax in the idiom of an Applicative functor. Further, we develop the properties of applicative functors and the generic operations they support. We close by identifying the categorical structure of applicative functors and examining their relationship both with Monads and with Arrows. SVMTool: A general POS tagger generator based on Support Vector Machines 2011-05-09T23:58:59+02:00 tag:alexandria.fr,2010:paper/7 Jesús Gimenez, Lluís Màrquez This paper presents the <code>SVMTool</code>, a simple, flexible, effective and efficient part–of–speech tagger based on Support Vector Machines. The SVMTool offers a fairly good balance among these properties which make it really practical for current NLP applications. It is very easy to use and easily configurable so as to perfectly fit the needs of a number of different applications. Results are also very competitive, achieving an accuracy of 97.16% for English on the Wall Street Journal corpus. It has been also successfully applied to Spanish exhibiting a similar performance. A first release of the SVMTool Perl prototype is now freely available for public use. A most efficient C++ version is coming very soon. Categorical programming with Abstract Data Types 2011-05-09T23:58:59+02:00 tag:alexandria.fr,2010:paper/6 Martin Erwig We show how to define fold operators for abstract data types. The main idea is to represent an ADT by a bialgebra, that is, an algebra/coalgebra pair with a common carrier. A program operating on an ADT is given by a mapping to another ADT. Such a mapping, called metamorphism, is basically a composition of the algebra of the second with the coalgebra of the first ADT. We investigate some properties of metamorphisms, and we show that the metamorphic programming style offers far-reaching opportunities for program optimization that cover and even extend those known for algebraic data types.