Real Time Data Mining
Pr João Gama, Laboratory of Artificial Intelligence and Decision Support, University of Porto, Porto, Portugal.
Nowadays, there are applications in which the data are modelled best not as persistent tables, but rather as transient data streams. In this keynote, we discuss the limitations of current machine learning and data mining algorithms. We discuss the fundamental issues in learning in dynamic environments like learning decision models that evolve over time, learning and forgetting, concept drift and change detection. Data streams are characterized by huge amounts of data that introduce new constraints in the design of learning algorithms: limited computational resources in terms of memory, processing time and CPU power.
In this talk, we present some illustrative algorithms designed to taking these constrains into account. We identify the main issues and current challenges that emerge in learning from data streams, and present open research lines for further
Active online learning
Dr Abdelhamid Bouchachia, Department of Computing, School of Design, Engineering & Computing
Bournemouth University, UK
Over the recent years learning from data streams that evolve over time has been witnessing an ever-increasing interest within research and industry communities. Typically a wide range of applications exploit data streams for different sorts of decision making, including monitoring, industrial processes, internet traffic, surveillance, etc. By their very nature, data streams are usually unlabelled given the high velocity of their generation. Collecting labelled examples become very difficult, delayed, costly and sometimes prone to errors. It is therefore very important to devise mechanisms to optimize the labeling process. Active learning offers a principled and systematic way to selectively choose candidate data examples whose labels are to be queried. The overall goal of active learning is to provide, in the worst case, the same performance as that of passive learning (i.e., relying on random sampling) while using less labeled examples. Obviously, the learner should also be able to accommodate unlabeled and labeled data in an online manner.
In this talk we will cover recent work on active learning for data stream classification, which is known as stream-based selective sampling. In this latter, the learner makes immediate query decision for each data example during a single scan of the data stream. Stream-based selective sampling is in particular suitable for applications that demand on-the-fly interactive labelling. It is however difficult, because the learner lacks complete knowledge of the underlying data distribution and because such distribution may suffer dynamic change over time. We will overview active learning for stationary as well as non-stationary evolving data streams. In particular, we will discuss multi-criteria active learning and methods for dealing with data drift using online active learning. We will also highlight some of the typical applications where online active learning is relevant.