European Courses in Advanced Statistics - Courses
The Ninth Course in the ECAS Programme:
Data Mining and Explorative Multivariate Data Analysis |
SAN MARCO DI CASTELLABATE, ITALY |
September 28 - October 4, 2003 |
Scientific Programme Committee:
H.H. Bock, University of Aachen, Germany
L. D´Ambra, University of Naples, Italy (Chair)
B. Fichet, University of Marseilles, France
W. Krazanowski, School of Math. Sciences, Exeter, England
D. Peña, Universidad Carlos III, Madrid, Spain
M. Vichi, University of Rome, Italy
Organising Committee:
P. Amenta, University of Lecce
L. D´Ambra, University of Naples (Chair)
R. Lombardo, Second University of Naples
P. Sarnacchiaro, University of Naples
B. Simonetti, University of Naples
Scope of the course:
In the context of data mining process, one of the goal is to aggregate or amalgamate the information contained in large data-sets into manageable (smaller) information nuggets. Data reduction methods can include sophisticated techniques like clustering, principal component analysis, etc... However, an important general difference in the focus and purpose between data mining and traditional exploratory data analysis is that data mining is more oriented towards applications than less concerned with identifying the basic nature on the underlying phenomena.
Multivariate exploratory techniques, designed specifically to identify patterns in multivariate data sets, include: clusters analysis and classification trees, linear and non-linear principal component analysis, multidimensional scaling, stepwise linear and non-linear regression, principal component regression, partial least squares, etc...
The course aims to show how different problems from several domains can be fruitfully tackled with methods from multivariate data analysis which are capable of providing specialized tools for treating complex data in data mining process.
To further facilitate the understanding of multivariate analysis, strong importance will be given to applications from different types of problems which underline the benefits of multivariate analysis.
Topics:
- Data Mining: complexity, cleaning compression. Definition and objectives.
R.D. De Veaux (Williams college, USA) - Data Mining: complexity, visualizing low and high dimensionnal data.
R.D. De Veaux (Williams college, USA) - Multivariate explorative data analysis: the linear and non-linear approaches through splines
J.F. Durand (University of Montpellier, France)
- Multivariate regression analysis wiyh strongly correlated variables. Linear partial least squares and structural equation modelling
G. Vittadini (University of Milan, Italy) - Classification trees, random forests, MART and MARS as variations on trees.
R.D. De Veaux (Williams college, USA) - Methods for assessing reliability of outcomes and determining numbers of components
H. Kiers (University of Groningen, The Netherlands)
- Multivariate regression analysis with strongly correlated variables. Reduced rank regression. Principal covariates regression. Principal components regressions
H. Kiers (University of Groningen, The Netherlands)
- Classification and clustering
M. Vichi and H. Bock (University of Rome, Italy, and University of Aachen, Germany) - Neural networks and data mining
M. Vichi and H. Bock (University of Rome, Italy, and University of Aachen, Germany) - Perspective and conclusion on data mining and explorative multivariate data analysis
H. Kiers (University of Groningen, The Netherlands)