Abstract
Mashups are a prevailing kind of web applications integrating external gadget APIs often written in the JavaScript programming language. Writing secure mashups is a challenging task due to the heterogeneity of existing gadget APIs, the privileges granted to gadgets during mashup executions, and JavaScript’s highly dynamic environment. We propose a new compiler, called Mashic, for the automatic generation of secure JavaScript-based mashups from existing mashup code. The Mashic compiler can effortlessly be applied to existing mashups based on a wide-range of gadget APIs. It offers security and correctness guarantees. Security is achieved via the Same Origin Policy. Correctness is ensured in the presence of benign gadgets, that satisfy confidentiality and integrity constraints with regard to the integrator code. The compiler has been successfully applied to real world mashups based on Google maps, Bing maps, YouTube, and Zwibbler APIs.
Introduction
Mixing existing online libraries and data into new online applications in a rapid, inexpensive manner, often referred to as mashups, has captured the way of designing web applications. ProgrammableWeb mashup graphs currently report that over 6,000 mashup-based web applications and over 11,000 gadget APIs currently exist (
In a mashup, the integrator code integrates gadgets from external code providers. Typically, code is written in JavaScript (JS) and executes on the browser as embedded script nodes in the Document Object Model (DOM) [24]. External gadget code in a mashup can be included in two ways:
either by using the script tag and granting access to all the resources of the integrator;
or by using the iframe tag, in which case the Same Origin Policy (SOP) applies. The SOP isolates untrusted JavaScript external code, limiting the interaction of gadget and integrator to message sending [3].
Static analysis to confine JavaScript programs is feasible for large-scale code consumers such as Facebook.com or Google.com, since they can restrict the JavaScript subset in which developers can write gadget code. Furthermore the size of those gadgets are relatively small. However, when it comes to small code consumers and large gadget providers, such as Google Maps API, full-fledged static analysis is usually infeasible since code providers do not confine them to a certain subset of JavaScript, and the gadget code size is usually large and difficult to be analyzed. Moreover, gadget code is subject to change from time to time by the provider. Mashup programmers are challenged to provide flexible functionality even if the code consumer is not willing to trust the gadgets that mashups utilize. Unfortunately, programmers often choose to include gadgets using the script tag and resign to security in the name of functionality.

Target architecture automatically generated by Mashic.
Recently, Smash [11], AdJail [26], and PostMash [2] proposed to use inter-frame communication between integrator and gadgets. Smash proposes a secure component model for mashups that generalizes the security policies imposed by the SOP. The model is implemented via inter-frame communication and offered as JavaScript libraries. However, integrators and gadgets code have to be adapted to this specific way of communication. AdJail focuses on advertisement scripts by delegating limited DOM interfaces from the integrator. PostMash targets interfaces to operate on gadgets and proposes an architecture for mashups depicted in Fig. 1. In the PostMash design there are stub libraries on both the integrator and the gadget. On the integrator side, the stub library must provide an interface similar to the original gadget’s interface. The stubbed interface sends corresponding messages by means of the PostMessage API in HTML5. On the gadget side, there is another stub library, listening and decoding incoming messages. Barth et al. [2] evaluate the feasibility of the PostMash design via a case study using a version of a Google Maps gadget by creating a stub library that mimicked GMap2 API. Regarding the libraries, the authors argue that the stub library can either be provided by the integrator (one for each untrusted gadget), or by the gadget in which case the library must be audited for security by the integrator.
In this work, we address the following questions about the PostMash design:
Can the stub libraries be made general (the same libraries for every gadget and integrator)? Can PostMash mashups be automatically generated starting from potentially insecure mashups and preserving only the good behavior of the original mashup? Is it possible to precisely define the security guarantees offered by the architecture?
We have positively answered these questions.
We address questions (1) and (2) with a novel compiler called Mashic which inputs existing mashup code, JavaScript code integrated to HTML, to generate reliable mashups using gadget isolation as shown in Fig. 1. In addition, for question (2), we formalize the notion of “benign gadget” that is useful to prove precisely in which cases the generated mashup behaves as the original one. Notably, the answer to question (3) corresponds to the first formalization as an observational semantics equivalence of the security guarantees offered by the Same Origin Policy in a browser, that, we conjecture, coincides with a form of declassification policy known as delimited release [34]. The Mashic compiler [35] offers the following features:
Automation and generality: Inter-frame communication and sandboxing code is fully generated by the compiler and can be used with any untrusted gadget without rewriting the gadget’s code. After sandboxing, gadget objects are not directly reached by the integrator when the SOP applies. Instead the integrator uses opaque handles [36] to interact with the gadget. Due to the asynchronous nature of the PostMessage API, integrator’s code is transformed into Continuation Passing Style (CPS).
Correctness guarantees: We prove a correctness theorem that states that the behavior of the Mashic compiled code is equivalent to the original mashup behavior under the hypotheses that the gadget is benign and correctness of marshaling/unmarshaling for objects that are sent via PostMessage.
The correctness notion of marshaling/unmarshaling allows us to identify, for example, that the implementation of a secure mashup is not correct as soon as the integrator sends an object with a cyclic structure to the gadget (if the implementation uses the JSON stringify for marshaling).
Precisely defining a benign gadget turned out to be a technical challenge in itself. For that, we instrument the JavaScript semantics extended with HTML constructs by a generalization of colored brackets [18] and resort to equivalences used in information flow security [33].
Security guarantees: We prove a security theorem that guarantees a delimited form of integrity and confidentiality for the compiled mashup. Information sent from the integrator to the gadget, corresponds to a declassification. We prove that the gadget cannot learn more than what the integrator sends. Analogously, the influence that the gadget can have on the integrator is delimited to the actions that the integrator performs with the messages that the gadget sends to the integrator. These guarantees are essential for the success of the compiler since the programmer can rely on this precise notion of security for compiled mashups using untrusted gadgets without further hypotheses. Indeed, if the gadget is not benign in the original mashup, malicious behavior is neutralized in the compiled mashup. This proof relies on the browsers’ SOP, that we formalize by means of iframe DOM elements.
The proposed compiler is directly applicable to real world and widespread mashups. We present evidence that our compiler is effective. We have compiled several mashups based on Google and Bing maps, YouTube, and Zwibbler APIs.
In summary our contributions are:
The Mashic compiler, its design and implementation, that can effortlessly be applied to existing mashups. The Mashic prototype and proofs are available online [35].
Security and correctness guarantees for Mashic compiled code and hence direct guarantees for the mashup end consumer. To our knowledge, this is the first work to formalize and prove guarantees of correctness and security for real world mashups.
A formalization of the SOP in browsers, related to frames and script tags, in a standard small-step semantics style.
A decorated semantics for JS extended with HTML constructs. This semantics provides clarity, in a highly dynamic language as JS, regarding ownership of properties in the heap and we prove it useful by specifying confidentiality and integrity policies in our theorems.
Case studies based on existing widespread mashups that demonstrate the effectiveness of the compiler.
An optimization for reducing the cost of cross-domain operations between gadgets and integrator in which an automatic batching mechanism groups together cross-domain operations in straight-line code, supporting loops and branching instructions.
Limitations. The current implementation of the Mashic compiler suffers from the following limitations:
Unsupported constructs: Our integrator transformer currently supports the full JavaScript language [12] except for a few programming constructs. Specifically, the for-in construct and exception construct are not supported. Some JavaScript features considered “dangerous” such as eval are not supported either.
Multiple gadgets inter-communication: The compiler is completely independent of gadget code, and does not support inter-gadget communication (communication is always done via the integrator), since this would imply transforming gadgets that want to use others’ interfaces. For simplicity, the formal presentation of the Mashic compiler applies to one gadget but its implementation supports multiple gadgets by generating unique ids for each iframe and using them in the proxy interface. Authentication is ensured by the PostMessage mechanism [3].
Symmetric interface: The current Mashic architecture is restricted to mashups that employ only one-way communication, i.e. only the integrator will invoke interfaces provided by the gadget. (Although the gadget is not allowed to send requests to the integrator, it can certainly reply to those that it receives as Mashic supports callbacks.) Certain types of mashups do not fall into this category, notably mashups containing advertisement scripts. Louw et al. [26] addresses two-way communication in AdJail where a subset of the DOM interface from the integrator is also provided to the gadget by dynamically modifying the DOM interface in the sandboxed gadget. In Mashic, in order to enable general interfaces to be exposed to gadgets, the gadget has to be CPS-transformed. At the cost of losing gadget-code independence, it is straightforward to use the Mashic compiler (transformations applied to integrator) for gadgets code, without loosing any of the correctness guarantees.
Remarks. This paper extends an earlier conference version [27]. We extend the conference version by adding explanations, examples, and studying the performance of the Mashic compiler. We propose and implement an optimization based on future batches by Bogle and Liskov [5] and on batching remote procedure calls by Ibrahim et al. [21].
Related work. The closest works to Mashic are AdJail [26], Smash [11], and PostMash [2], and are described above. We focus now in other related work. Nikiforakis et al. [31] crawl more than three million pages over the top 10,000 Alexa sites and show that many sites trust gadgets that could be successfully compromised by determined attackers. Moreover, a study over 6,805 unique websites [38] reveals that insecure JavaScript practices are common, showing that at least 66.4% of mashups include gadgets into the top-level documents of their webpages. Jang et al. [23] study on top of 50,000 websites privacy violating information flows in JavaScript-based web applications. Their survey shows that top-100 sites present vulnerabilities related to cookie stealing, location hijacking, history sniffing and behavior tracking. Browser implementation vulnerabilities have also been shown to leak JavaScript capabilities between different origins [4]. Many mechanisms to prevent JavaScript-based attacks have been deployed. For example the Facebook JavaScript subset (FBJS) [14] was intended to prevent user-written gadgets to attack trusted code but it did not really succeed in its goals [30]. Google Caja [17] is similar to FBJS, transforming JavaScript programs to insert run-time checks to prevent malicious access. Yahoo ADsafe [10] statically validates JavaScript programs. In contrast to the Mashic compiler, all of these mechanisms are gadget-code dependent. Maffeis et al. [29] resort to language-based techniques to find out a subset of JavaScript that can be used to prove an isolation property for JavaScript code. For that, they identify a capability-safe subset of JavaScript. They do not formalize the SOP and they focus on pure isolation of gadgets in contrast to our confidentiality and integrity properties.
Static analysis is usually not applicable or not sound for large web applications due to the highly dynamic nature of JavaScript programs and because gadgets in general cannot be restricted to subsets of JavaScript. Static analysis for JavaScript subsets has been proposed by [32], providing a formal guarantee of isolation for ADsafe subset. Relying on type-based invariants and applicative bisimilarity Fournet et al. [15] show full abstraction of a compiler from ML programs to JavaScript programs. Their result is similar to ours in that compiled JavaScript programs will execute in isolation, but it requires integrator code to be written in ML, which Mashic does not require.
As a response to the increasing need to get flexible functionality without resigning to security guarantees, the research community has proposed several communication abstractions [9,11,22,37]. Specifically, Wang et al. [37] create an analogy between operating systems security [13,39] and mashups security to develop communication abstractions. OMash [8] proposes a refined SOP to enable mashup communication. These abstractions usually require browser modifications and so far have not been adopted in HTML standards [20]. There are other works [1,6] pursuing the direction of formalizing web applications, but none of them formally model the SOP as a security property using observational semantics equivalences.
In order to provide some background, we illustrate with a mashup different kinds of gadget inclusions and inter-frame communication. We reuse this example throughout the rest of the sections. There are two major types of gadgets in web mashups. The first type requires an interface from the integrator to accomplish some tasks. For instance advertisement scripts, which necessarily need to gather information of the integrator page through DOM APIs to implement the advertisement strategy. Another example of the first type of gadgets are user-supplied gadgets in social network platforms such as facebook.com. The second type provides a set of interfaces to the integrator. For instance the Google maps API, that provides various interfaces to operate a map gadget, is of this kind. We focus on the second type, that is gadget scripts that provide a set of interfaces to enable the integrator to manipulate the gadget.
In the example an integrator at i.com wants to include a gadget

Code snippet of http://i.com/integrator.html.
We focus on gadget scripts that provide a set of interfaces to enable the integrator to manipulate the gadget. The integrator calls methods or functions as interfaces to change the state of the gadget. For example, the following is a code snippet (in the integrator) to manipulate the untrusted gadget via interfaces:

Code snippet of http://i.com/integrator.html.
The gadget defines a global variable
The

Code snippet of http://i.com/integrator.html.
The secret flows to an untrusted source, and the price is modified at the gadget’s will if the gadget contains the following code:

Non-benign gadget.
If the gadget is isolated using the iframe tag with a different origin, variables

Code snippet of http://i.com/integrator-msg.html.

Code snippet of http://u-i.com/gadget.html.
Instead of directly including the script, the integrator invents a new origin u-i.com to be used as an untrusted gadget container, and puts the gadget code in a frame belonging to this origin. By doing this, the JavaScript execution environment between integrator and gadget is isolated, as guaranteed by the browser’s SOP. Limited communication between frames and integrator is possible through the PostMessage API in the browser1
Inter-frame communication is also possible via e.g. navigation policies [3] but this kind of communication is now obsolete.

PostMessage example.
Compilation with Mashic will not preserve the malicious behavior of Listing 4 but will only preserve behavior that does not represent a confidentiality or integrity violation to the integrator.
The compiler relieves the programmer from rewriting code. Instead of rewriting gadget’s code, our compiler inserts a proxy and a listener library that implement a communication protocol for manipulating gadgets independently of untrusted gadgets sandboxed in frames. Instead of rewriting integrator code, our compiler implements a CPS transformation to overcome the asynchronous nature of PostMessage. After compilation of code in Listings 1 and 2 the gadget is modified in the following way:

Compiled gadget at http://u-i.com/gadget.html.
The integrator is modified in the following way:

Compiled integrator.
Notice that the gadget code used in the compiled gadget is not modified from the original one. The proxy library and the listener library provide general ways to encode gadget operations, and the programmer does not need to manually write the stub library to operate on confined gadgets. The integrator code, as we mentioned above, is transformed to CPS code
We propose a decorated semantics to partition a JavaScript heap at the granularity of object properties. In order to prove security policies in a mashup, it is essential to distinguish at each execution step properties corresponding to different principals. Note that static decorations assigned to variables, traditionally used in information flow security policies [33], are not enough to specify security in JS programs due to two reasons: the dynamic nature of JS does not always allow us to syntactically determine the set of properties modified by a program (cf. [30]), and existing native properties in the heap may either be changed by programs or its decoration may depend on the context due to the SOP.
The following example shows some dynamic features of JavaScript.

Dynamic features of JS.
First, the minimal container in JavaScript is property rather than object (or variable in other languages), so two properties of the same object could belong to different principals. Second, properties can be created dynamically, so static decoration is not possible.
Hence, we have resorted to ideas from colored brackets [18] and adapt them to a semantics modeling the SOP in the browsers. When decorations are erased, our JavaScript decorated rules are compliant with JavaScript semantics (Maffeis et al. [28]). For the sake of simplicity in the presentation, we limit this section to the inclusion of only one gadget as a frame, although the JavaScript semantics (and the Mashic compiler) is not limited in the number of gadgets included in a mashup. Thus, in this presentation, we need to distinguish three different colors. The ♠ color for the Integrator Principal, the ♣ color for the Gadget Principal, and the ♦ color to denote a neutral principal. We use □ or △ to denote any of them. For the sake of brevity, we do not include an origin parameter in the primitive PostMessage since there are only two possibilities: either the integrator communicates with the frame or vice-versa. We also simplify AddListener for the formal presentation: we assume that the only events it listens to is the event “message”.
An object o is a tuple
Heaps
Objects are stored in heaps. A heap h is a partial mapping from locations in a set
To lookup for name x from scope object ℓ in h, we use
Similarly, the prototype chain represents the hierarchy between objects. A property that is not present in the current object, will be searched in the prototype chain, via the
Integrator and gadgets global objects. We define a (simplified) initial integrator global object below (we use the form
Since by SOP the integrator and the frame do not share objects in the heap, we define similarly an initial global object
Native functions in a heap are represented by locations (e.g.
It is useful to define an initial heap. An initial heap for the integrator (resp. for the frame), denoted by

Example: Uniformly colored heap.
We say that a decorated object o is single-colored if and only if all properties of o are decorated with the same color. We say that a decorated heap h is uniformly colored if and only if for all
Projection
If h is a uniformly colored heap, and
We define

JavaScript syntax with decorations.
We present in Fig. 3 a simplified syntax of the extension of JavaScript with HTML constructs. We assume that
Before a JavaScript program in a script node is executed, or before a body of a function is evaluated, all variable declarations are added to the current scope object in the heap. To that end, we use a function VD that returns a heap and takes as parameters a heap h, a location ℓ of the current scope object, a statement s, and a color □ to bind variables declared by

Variable declaration function.
Finally we define a helper function
Instrumented global configurations feature a decoration component that denotes the owner principal of the program being executed. Decorations are propagated via semantics rules and, importantly, do not affect the normal semantics of JavaScript programs (they can be erased without further changes in the state). A global configuration is a 5-tuple A subscript x identifying the execution context of current code, I denotes that the current context is the integrator, and F denotes that the current context is the frame. We use the subscript x to denote a wildcard symbol for both I or F. A decoration □ that denotes the principal of the current program in the configuration. A heap h. A location A run-time program R currently being executed (see Section 3.6). A waiting queue Q in order to give semantics to the PostMessage mechanism.
A waiting queue is of the form
An initial configuration is of the form
We also define a configuration for the core JavaScript semantics
A decoration □ that denotes the principal of the current program in the configuration; A heap h; A location ℓ of the current scope object; A current statement s being evaluated.
Transitions between core JavaScript configurations are featured with a label
Semantics rules
We use a transition system to define the semantics of our language, via the
DOM semantics rules
We extend the syntax with run-time expressions:
In the run-time syntax, R denotes run-time programs being executed, extended by run-time frame

Decorated semantics rules (DOM).

Decorated semantics rules (Core JavaScript).

Decorated semantics rules (Core JavaScript, continued).

Decorated semantics rules (Core JavaScript, continued).
We define semantics rules for the DOM, i.e. the global transitions, in Fig. 5. Now we comment on the semantics rules.
A mashup execution initializes the heap of the configuration to the initial heap of the integrator
A △-decorated script starts by
When an execution of a statement terminates, we continue with the rest of the computation.
This is a contextual rule: if the core JavaScript configuration can advance by 1 step with label
This rule is similar to
Similar to rule
These rules are for execution of a frame. A frame fetches the content
When no program is executing, we can apply the event listeners to pending messages in the queues (this rule and rule
See explanation above.
The semantics rules of core JavaScript are defined in a context-redex style in Figs 6, 7, and 8. The evaluation contexts of the core JavaScript are defined below, where
Now we explain the transition rules in detail.
To resolve the We first allocate a new empty object in h, represented by We do not explicitly mention the heap h when there is no ambiguity.
To invoke a function, we create a new scope object
This rule is similar to
This is a contextual rule for evaluating a body of a function.
When a function invocation is finished and no value is returned, we restore the scope to ℓ and return
When a function invocation is finished and value v is returned, we restore the scope to ℓ and return v as result.
The
This is a contextual rule for evaluating a body of a function as object initialization.
When an execution of
To create a new function, we first create an empty prototype object
We use the
We use a conventional interpretation of
We first look up through the scope chain to check if x is defined in the chain. If it exist in scope object
To create a property m of
Suppose a location ℓ stored in variable x represents the object o in the heap. A program, with decoration ♣,
It is similar to
To access a property of an object, we look up through the prototype chain. The value v could possibly be an location. When the property m does not exist we return
To resolve a variable name, we look up through the scope chain.
To de-marshal an object we use
To marshal an object we use
To post a message m, we use a label m to indicate the side-effect.
To set a event listener
Since
When a statement evaluates to a value, we continue with the next one.
It is a contextual rule for evaluating sequential composition of statements (block).
If the condition expression e evaluates to
If the condition expression e evaluates to
If the condition expression e evaluates to
If the condition expression e evaluates to
For
(Decorated global object).
Recall variables
Compilation overview
In this section we describe in detail how proxy and listener libraries work. For that, we need to define opaque object handles.
Opaque object handle
According to the SOP policy, the integrator and the framed gadget cannot exchange JavaScript references to objects. Our libraries provide a way for the integrator to refer to objects that are defined inside the gadget, called opaque object handles [2].
An opaque object handle is essentially an abstract representation of a JavaScript object. In our libraries it is a unique number associated with an object in the frame.
The following code excerpt demonstrates the data type for an opaque object handle:
In practice, an opaque object handle is an object with a field
On the listener library side, we keep a list for associating handles and objects:
Since an object could possibly be an opaque object handle, it is necessary to dynamically check whether the object being operated is an opaque object handle or a local object existing in the integrator. If it is an opaque object handle, we need to proxy the operation to the sandbox; if it is a local object, we can directly operate on this object. We define an

isOpaque function.
Bootstrapping. We model the interface provided by a given gadget as a set
For instance in our running example,
The Mashic compiler inserts bootstrapping scripts on both sides, integrator and gadget. The bootstrapping script for the integrator takes a set of variables

Integrator bootstrapping.
The bootstrapping script for the gadget also generates opaque object handles and adds them to a list.

Gadget bootstrapping.
In the rest of the paper we let
Proxy and listener interface. In the rest of the paper we let
To obtain an opaque object handle from a global variable in the gadget, we use the

Code snippet of the proxy library.
The
The
There are other interfaces that are supported for operating on opaque objects handles:
Recall the mashup from Section 2. The interface to obtain an opaque object handle in the integrator is:
where “gadget” is the interface provided by the gadget and the second parameter is a callback function. Once the integrator obtains an opaque object handle, it can use other interfaces from the integrator to operate on the opaque object handle. If
The interface
In the listener library, there are interfaces to generate a response as the following function:
The function
Here we give details on how interface
The integrator invokes
The proxy sends the following message to the listener in iframe:
The proxy library associates
When the listener receives the message, it first obtains the real
Once the invocation is finished, it sends back a message: where
Upon receiving the response, the proxy library applies the continuation
JavaScript does not support Scheme-style call/cc (Call-with-Current-Continuation) for suspending and resuming an execution. Demanding the programmer to write in CPS would render the proposal impractical.
Recall the example in Section 2. In order to obtain the property
This style is similar to CPS, where one needs to explicitly specify continuations for the rest of a computation. In order to reuse the legacy code that operates on a gadget, we propose an automated transformation of legacy code in such a way that programmers do not need to rewrite their code. Legacy code in the integrator will be CPS-transformed, and inserted with dynamic checks for opaque object handles when necessary.

Transformation of statements.
We formally define the CPS transformation of an integrator code s, denoted
We transform

Transformation of expressions, Non-Message-Passing part.

Transformation of expressions, Message-Passing part.
In order to state the theorem, we define decorations for original and compiled mashups. In the original mashup we decorate the integrator as ♠ and the gadget as ♣.
(Decorated original mashup).
Let
In the compiled mashup we decorate the run-time libraries as ♦. The run-time libraries are marked as neutral color ♦ since we show with the correctness theorem that changes to the integrator’s heaps by the original and the compiled versions are indistinguishable. The runtime libraries do not appear in the original heap.
(Mashic compilation).
Let
We formally define the bootstrapping script that appears in Section 2. Given a set of variables (Bootstrapping script).
Correctness theorem
In this section we formally present the correctness theorem and its assumptions.
Preliminary definitions
Correct marshaling. We define the notion of correct marshal and unmarshal functions w.r.t. to a set of objects S. Intuitively this definition states that the process of marshaling and then unmarshaling an object preserves the structure of the object in the heap and preserves values that are not locations.
(Correct marshal/unmarshal for S).
Let ∼ be defined as
Definition 4 is useful for the correctness theorem of the compiler. It captures the weakest hypothesis possible for the correctness theorem to hold. Following this hypothesis, implementation of marshaling/unmarshaling functions may vary. In the current prototype of the Mashic compiler we implement these functions with JSON stringify and parse, which do not preserve the structure of the objects if the structure contains a cycle. Thus, these functions are considered correct only if the set S of objects to be marshaled does not contain objects with cyclic structures. We have chosen JSON stringify/parse for efficiency reasons. However, it is straightforward to write correct marshaling/unmarshaling functions for a set of objects that also contain cycles in their structures.
Benign gadget. Intuitively, a benign gadget
In order to state the definition we first define a benign gadget heap as a heap that contains gadget functions with confidentiality and integrity properties.
(Benign gadget heap).
A heap (integrity) (confidentiality) (preservation of benignity)
(Benign heap).
Recall the integrator’s code in Listing 3 in Section 2. If the gadget contains the following code, then the gadget will not produce a benign gadget heap:
Non-benign gadget heap.
The gadget defines a function in the heap which tries to read from the global variable
Program (integrity) (confidentiality)
(Benign gadget example).
Recall the example in Section 2, Listing 4. The gadget is not benign since it tries to read from the global variable
In the benign gadget definition we explicitly require that the initialization phase (adding functions to the heap) and execution of all functions (that are defined in the heap) always terminate.
It is possible to relax this definition by not requiring termination of benign gadgets (by using indistinguishability invariants for intermediate running expressions) but we consider more appropriate to see non-terminating behavior in gadgets as non-benign behavior since the gadget will never let the integrator execute. Hence if the gadget is non-terminating we do not offer any correctness guarantees (security guarantees still apply).
Notice that the termination requirement on gadgets does not imply termination of the mashup. The mashup might never terminate if gadget and integrator continuously run listener continuations and this is independent of termination of functions in gadgets (see e.g. fair termination [7]).
Correct integrator. For correctness, we impose some reasonable restrictions on the integrator’s code. Intuitively, a correct integrator does not modify directly a non-♦-colored property; and does not use objects defined by gadgets in the prototype chain. This restriction is not limiting in practice since an integrator usually operates on gadgets via the interfaces provided by it and not by directly modifying its properties. Given marshal/unmarshal functions, we also require that a correct integrator only sends to gadgets objects for which these functions are correct.
First, we give a notion of reachability of a location from a global variable in a given heap h.
(Reachability).
A location l is reachable from a variable x in h if and only if either:
Now we give the definition for correct integrator. Program If e is of the form If e is of the form For any ℓ such that If e is of the form We illustrate why a correct integrator’s object cannot have a gadget’s object as its prototype object (bullet (3)). Assume that in the heap of the original mashup, Hence, by reading the property (Correct integrator for f,
(Correct integrator prototype chain).
Indistinguishability and correctness
To define indistinguishability between the original heap and compiled heap, the structure of the scope chain in the heap must be preserved. We start by defining the notion of scope object and scope chain. We use
(Scope object).
Let h be a heap, and ℓ be a location for a scope object. We say ℓ is a scope object in h if one of the following conditions is satisfied:
(Scope chain).
Let h be a heap, and For
We use

Example: Scope indistinguishability.
We define the β-indistinguishability
Let for
The intuition of scope indistinguishability is that the structure of scope chains is preserved by the integrator transformation (even if scope chains do not have a one to one correspondence), as illustrated in Fig. 12. In the figure, scope objects are represented by round points, and the solid arrows represent the scope chain. The scope chain on the left is obtained by a normal execution of integrator code. The scope chain on the right is obtained by execution of the corresponding transformed code, where there are more CPS-administrative scope objects (gray-colored in the figure). The scope indistinguishability does not take into consideration those CPS-administrative scope objects.
Two values are indistinguishable either if they are equal or if they are both locations related by β. Even assuming a deterministic allocator, we need β to relate two heaps because objects created in the original mashup and compiled mashup will be necessarily different. In particular, the compiled heap will contain more objects.
(Value indistinguishability).
Let
Objects are related if they have the same properties with the same values. Exceptions to this are properties
(Object indistinguishability).
Let
We give an example illustrating object indistinguishability.
(Object indistinguishability).
Let
Finally, heaps are indistinguishable if all objects are indistinguishable and respective scope chains are indistinguishable.
(Heap indistinguishability).
Two pairs of heap and scope object for every if
The correctness theorem gives strong guarantees if the gadget is benign: behavior of original and compiled mashup are equivalent in terms of the integrator’s heap. If the gadget is not benign there are no correctness guarantees but only security guarantees described in the following section. We use in the hypothesis that integrator and gadget do not declare the same variables
Notice that this definition of
In the following, let
(Correctness).
Let
The proof proceeds in two stages by means of an intermediate compilation and by structural induction on programs. The behavior of the original mashup is indistinguishable from that of the intermediate compilation; and the intermediate compilation behaves indistinguishably from the Mashic compilation.
Auxiliary definitions and lemmas
In this section we give some auxiliary definitions and some useful lemmas to prove the main theorem. First we define the notion of intermediate compilation in which the gadget is not sandboxed by an iframe and the integrator is compiled to CPS-only code without using the proxy and listener interface. The intermediate compilation will be used in the proofs.
(Decorated intermediate compilation).
We define decorated intermediate compilation
The intermediate CPS transformation is identical to the Mashic CPS transformation except for the rules shown in Fig. 13, where the proxy and the listener library are not used, since the gadget is not sandboxed.

CPS transformation (intermediate).
The following lemma shows that the Mashic compilation and the intermediate compilation preserve correctness of the integrator, since they will not introduce more behavior.
If
Straightforward by structural induction on the definition of
We define the notion of strong object indistinguishability, denoted
Let
Accordingly, we update the definition of strong heap indistinguishability.
(Strong heap indistinguishability).
Two pairs of heap and scope object for every if
The following lemma shows that given two indistinguishable scope chains, the scope looking-up process will return indistinguishable heap locations for any normal variable. By normal variable we mean variables that do not start with a
(Scope look-up).
Let h
if
if
Straightforward by Definition 11 and Definition of
We formally define the shape of an opaque object handle.
Let o be an object, o is an opaque object handle with id n if and only if
We define a relation between two heaps up to a mapping from id of opaque object handles to heap locations. The intuition is that if in one heap, a property points to an opaque object handle, then in the other heap, it must point to a location corresponding to the opaque object handle by the mapping.
Let if otherwise
In this section we present the security theorem. In Mashic compiled code, the integrator has complete access to gadget resources but the gadget only has access to resources offered by the integrator in the proxy library. After Mashic compilation, the malicious gadget cannot scan properties of the integrator, as e.g. in Listing 4, because the SOP policy prevents the framed gadget from accessing the JavaScript execution environment of the integrator as shown in the
(Gadget modifies native functions).
A native function that can commonly appear in the integrator code is the
In this example, after 5 ms a pop-up window with caption “timeout!” appears.
This function, as all native JavaScript functions, is associated as a property of the global object. As many native functions the code associated to the
Suppose the untrusted gadget owned by the attacker writes a function of its own into the
Then every call to
If instead the gadget is enclosed in a frame, the same code trying to affect the
In order to state the security guarantee, we consider that all code coming from origin u is part of the gadget principal ♣. In contrast to the decorations used for correctness, we now consider the listener library and bootstrapping as gadget’s code. This should not be surprising since the gadget can modify this code and the security theorem must be valid also in this case. We decorate all code residing in the integrator with ♠. This is also different from the correctness theorem. Essentially, we are now interested in asserting that the gadget cannot change the proxy library or bootstrapping in the integrator, whereas for the correctness theorem we were interested in heap indistinguishability only for the integrator heap in the original and compiled mashups. Furthermore, we assume
(Decorated Mashic compilation (for security theorem)).
Given an integrator script
(Integrity violation).
In the example referred just above, the initial heap contains the native function setTimeout. Since the initial heap is decorated with ♠, the
(Confidentiality violation).
Recall variable
After execution of the non-benign gadget in Listing 4 with an integrator’s global object containing
We show that for any gadget code
(Security guarantee of integrator).
Let
The proof of security proceeds by induction on the length of the execution and is simpler than the one of the correctness theorem.
Implementation and case studies
The Mashic compiler is written in Bigloo3
CPS in JavaScript. Since JavaScript does not support any tail-recursive call optimization, CPS-transformed code can easily run out of call stacks. In order to deal with this, we implement a trampoline mechanism as proposed by Loitsch [25]. We define a global variable
This is shown in Listing 16.

Trampoline of tail call.
A guard loop, on the top level, detects if a trampoline object is returned, as shown in Listing 17. If a trampoline object is detected, the loop restarts the execution of the tail call.

Guard loop of trampoline execution.
Event handler. In mashups, we also find demands for registering integrator-defined functions as event handlers of gadgets’ DOM objects. For example, the Google Maps API provides an interface to set an integrator’s function as a handler of the event of clicking on the map. Every time the map is clicked, the corresponding function will be invoked, to notify the integrator of the event. By the SOP, the integrator and the gadget in a Mashic compilation cannot exchange function references. Hence we design and implement a mechanism called Opaque Function Handle to achieve the same functionality of an event handler. Similar to the opaque object handle, we associate opaque function handles with function objects on the integrator side. When an iframe-sandboxed gadget receives a function handle, it creates a wrapper function by using the function shown in Listing 18.

Wraping function handle.
The wrapped function, upon each invocation, sends a message to notify the integrator to invoke the function associated with the function handle.

Case studies of applying Mashic compiler. (a) Map directions. (b) YouTube player controls.
Selected case studies
Case studies. We have successfully applied our compiler to mashups using well-known gadget APIs, such as Google Maps API, Bing Maps API, and YouTube API. Those examples involve non-trivial interactions between the integrator and the gadget.
In Fig. 14 we show two concrete examples. The first example is a mashup using the Google Maps API to calculate driving directions between two cities. The map gadget is sandboxed by the Mashic compiler in an iframe, as indicated by a black box in the figure. The compiled integrator, as in the original integrator, permits to choose a starting point and an ending point to display a route in the map. The gadget’s response displayed by the integrator, is the distance between the two points. The latter example shows a sandboxed YouTube player, where one can control the behavior of the player through buttons in the integrator.
We report a selected list of mashups in Table 1. In the first column of the figure, the mark ‘P’ means that the integrator’s code was obtained from publicly available code in the web, whereas mark ‘O’ means that the code is ours.
For the 25 examples of Google Maps API we have studied, we have successfully compiled 23 of them. The other 2 examples are not supported by the Mashic compiler. They are
Discussion on performance and optimization. The Mashic compiler prototype does have a running overhead on a compiled mashup compared to the original mashup. (This penalty is not perceptible for the final consumer of the mashup, if the interaction with the gadget is not inside a loop, for example.) The performance penalty in the Mashic compiler without optimizations [27] mainly comes from the unoptimized CPS-transformation and message-passing. We have implemented an optimized version of the compiler based on batched futures [5], [21] that we discuss here.
Batching optimization. Message-passing is the main cause of performance penalty, especially inside a loop. For example in the marker drawing mashup we show in Table 1, a loop inserts markers into the map. For each marker, it requires two round-trips of messages. The total message-passing overhead is proportional to the number of loop iterations. Although in practice, as in the above example, it is often the case that the loop can be parallelized, parallelism is not yet available in JS. Another alternative is to “batch” these messages to reduce the total message-passing overhead to constant time.
The key idea of our optimization is to transform programs in such a way that messages are only exchanged when the result produced by the gadget is actually required for determining the control flow in the integrator code. Consider, for instance, the program below in which the gadget is assumed to implement three methods g1, g2, and g3, while the integrator is assumed to implement two functions i1 and i2:
During the execution of the program generated by the original mashic, three messages are exchanged between the integrator and the gadget, each of them triggered by a single call to one of the gadget’s methods. In contrast, the program generated by the optimised mashic only exchanges one message with the gadget, just after the evaluation of the guard of the condition, as the outputs of previous calls are required for determining the control flow in the integrator code.
The code generated by the original mashic compiler is such that every time the integrator code interacts with an opaque object handle, the interface of the proxy library responsible for creating the corresponding message and sending it to the gadget is invoked. For instance, when using an opaque object handle as a function, the interface
This interface checks whether its first argument is a mashic internal object (either a batched future or a value envelope, which is explained later in this section) in which case it registers the current continuation and dispatches all the pending requests to the gadget using a special proxy function called
Every time a batched future is created, it is registered in a special array bound to the global variable
When the gadget’s message arrives with the responses for all pending requests, the function
The proxy interfaces in the integrator’s side must be changed in order to handle the two kind of batched futures and the value envelopes. In the original mashic runtimes the role of the proxy interfaces Create the message containing the request to the gadget; Register the current continuation; Send the message to the gadget. a simple batched future, if a complex batched future, if a “real” value, if
In the optimised version of mashic, the role of these interfaces is to determine whether the output of the corresponding operations should yield a “real” value, a simple batched future, or a complex batched future, generate the appropriate result, and immediately call the current continuation using it as its argument. The execution of GET_PROPERTY(g, prop, cont) calls the continuation
Note that these proxy interfaces must also take into account value envelopes. Namely, they have to unnest the real value they contain before applying the corresponding operation. Since binary operations may be performed on both value envelopes and batched futures, we introduce an additional proxy interface
Since messages are only dispatched to the gadget when the value it returns is required for determining the control flow on the integrator’s side, the optimised version of mashic can use a partial-CPS transformation. Hence, continuations are only generated in program points where the control flow is at stake, such as the guards of conditionals and loops, function calls, and method calls. The partial-CPS transformation has to combine CPS terms with non-CPS terms. Therefore, it has to consider several cases for each type of expression. However, to avoid cluttering the presentation, we assume in the rest of this section a full CPS transformation.
In order for the batching mechanism to work, the CPS transformation performed by the mashic compiler must be slightly modified. We illustrate the difference between the original and the optimized transformations using the examples of the rules for the conditional statement, the member selector, and the binary operation given in Fig. 15.

Optimised mashic-CPS transformation.
Given a member selector expression, the original mashic CPS transformation generates a conditional expression that checks whether the inspected object is an opaque object handle, in which case
The greater the number of messages that can be batched together, the greater the impact on performance of the optimised mashic compiler. In order to measure this impact, we wrote a simple mashup using Google Maps that randomly generates map markers inside a loop and then adds them to a map as shown in Fig. 16. While the mashup compiled using the original mashic sends a message to the gadget for each marker that is randomly generated, the mashup that is compiled using the optimised version sends to the gadget a single message containing the requests for the creation of all markers. The chart given in Fig. 17 illustrates the impact on performance of the batching transformation depending on the number of generated markers. Experimental results were obtained on Firefox 30.0 on a 2.4 GHz Intel Core i5 running OS X Version 10.9.3.

Map markers.

Comparison between mashic optimised version and original version.
We have proposed the Mashic compiler as an automatic process to secure existing real world mashups. The Mashic compiler can offer a significant practical advantage to developers in order to effortlessly write secure mashups without giving up on functionality. Compiled code is formally guaranteed to satisfy precisely defined integrity and confidentiality properties of integrator’s sensitive resources.
We do not address in this paper analysis to prevent security vulnerabilities introduced by the integrator’s code. Consider the following silly code:
This integrator will
The gadget might return some string representing a malicious JavaScript program. Then the integrator will execute the malicious code with its own privilege. To avoid this kind of vulnerabilities, analysis of the integrator’s code is required. This is orthogonal to the current Mashic compilation: information flow analyses for JavaScript can be found for example in [16,19].
Mashic offers correctness guarantees only if untrusted gadgets are benign. This is a goal of the compiler and not a disadvantage: mashup behavior should not be the same if a gadget is malicious. If the gadget is malicious the programmer does not get any alert that the compiled secured mashup does not behave as the original mashup: an interesting future direction will be to provide JavaScript code analysis that will conservatively detect non-benign gadgets in order to alert the programmer.
Footnotes
Acknowledgments
We acknowledge anonymous reviewers for their useful comments on this article. This work has been partially supported by the ANR project AJACS ANR-14-CE28-0008.
