ABSTRACT
Capturing the value in real-world data requires more than fitting trivial models or visually exploring the data. Rather, we must efficiently isolate driving variables, confirm or reject potential outliers and build models which are both accurate and trustable. Fortunately, multi-objective genetic programming (aka, ParetoGP) allows us to achieve this objective. ParetoGP will be the foundation technology in this tutorial; however, we will address the entire modeling process including data balancing, outlier detection and model usage/exploitation as well as the model development. In addition to covering the basic theory of ParetoGP, we explore key points using real-world industrial data modeling case studies as well as review best practices of industrial data modeling. Current economic conditions demand maximum efficiency in developing and exploiting maximal quality models; ParetoGP has been used for applications ranging from energy trading to active design-of-experiments to plant trouble-shooting to patent litigation modeling to ...
Index Terms
Real-world data modeling
Recommendations
Modeling Difficulties in Data Modeling: Similarities and Differences Between Experienced and Non-experienced Modelers
Conceptual ModelingAbstractWe study modeling difficulties encountered by experienced modelers while performing a data modeling task and compare our observations with findings we obtained from studying modeling processes of non-experienced modelers. Using the concept of ...
Solution Modeling Using Postfix Genetic Programming
This article introduces Postfix Genetic Programming GP, a postfix notation-based GP, approach to symbolic regression for solving empirical modeling problems. The main features of Postfix-GP are presented. These features include 1 postfix-based, variable-...
A Real-World Web Cross-Media Dataset Containing Images, Texts and Videos
ICIMCS '14: Proceedings of International Conference on Internet Multimedia Computing and ServiceDuring recent years, the amount of multimedia data on social websites is growing exponentially. It is observed that multimedia data corresponding to the same semantic concept usually appears in different media types and from heterogeneous data sources. ...
Comments