abstract = "Apache Spark is a popular framework for large-scale
data analytics. Unfortunately, Spark's performance can
be difficult to optimise, since queries freely
expressed in source code are not amenable to
traditional optimisation techniques. This article
describes Hylas, a tool for automatically optimising
Spark queries embedded in source code via the
application of semantics-preserving transformations.
The transformation method is inspired by functional
programming techniques of deforestation, which
eliminate intermediate data structures from a
computation. This contrasts with approaches defined
entirely within structured query formats such as Spark
SQL. Hylas can identify certain computationally
expensive operations and ensure that performing them
creates no superfluous data structures. This
optimisation leads to significant improvements in
execution time, with over 10,000 times improvement
observed in some cases.",
notes = "Hylas, Scala refelection 2013 'Infection Discovery
using DNS Data' challenge of the Los Alamos National
Laboratory, USA.