abstract = "A central challenge of developing and evaluating
artificial intelligence and machine learning methods
for regression and classification is access to data
that illuminates the strengths and weaknesses of
different methods. Open data plays an important role in
this process by making it easy for computational
researchers to easily access real data for this
purpose. Genomics has in some examples taken a leading
role in the open data effort starting with DNA
microarrays. While real data from experimental and
observational studies is necessary for developing
computational methods it is not sufficient. This is
because it is not possible to know what the ground
truth is in real data. This must be accompanied by
simulated data where that balance between signal and
noise is known and can be directly evaluated.
Unfortunately, there is a lack of methods and software
for simulating data with the kind of complexity found
in real biological and biomedical systems. We present
here the Heuristic Identification of Biological
Architectures for simulating Complex Hierarchical
Interactions (HIBACHI) method and prototype software
for simulating complex biological and biomedical data.
Further, we introduce new methods for developing
simulation models that generate data that specifically
allows discrimination between different machine
learning methods",
notes = "A heuristic method for simulating open-data of
arbitrary complexity that can be used to compare and
evaluate machine learning methods*Jason H. Moore,
Maksim Shestov, Peter Schmitt, Randal S. Olson
Institute for Biomedical Informatics, University of
Pennsylvania, D202 Richards Building, 3700 Hamilton
Walk, Philadelphia, PA
19104