Latent Trait Models on sparse data in organic geochemistry
INB Lunch-Seminar
Latent Trait Models on sparse data in organic geochemistry
Martin Schröder
Geochemistry for petroleum exploration is a highly complicated and interesting area of research, where tools for dimension reduction and data visualization might prove to be useful in future research.
In geochemistry for petroleum exploration one has to look at ecological
and historical factors that influence e.g. if generation of coal, oil or
gas was possible since an aggregation of a huge amount of biological
tissue has to be accumulated under the right non-oxidising conditions.
Then, one needs to consider that different forms of biological tissue
have different chemical reactions which are induced through heat, time
and pressure, and in addition may be altered through external chemical
compounds in the soil. In the end a geochemist might have up to 180
variables like the age, lithography, amount of total organic carbon,
oxygen, hydrogen, biomarkers and others, which depend on which chemical
analyses were used for his samples. It is not uncommon that different
analyses were used on different samples so that there might be a huge
amount of missing values when trying to analyse the relations of the
different samples to each other. This makes it hard to use statistical
standard methods, which results in the effect that geochemistry for
petroleum exploration is based on a lot of observations and experience.
Within a master project at the Aston University, current work tries to
account for the missing value problem and to apply different Latent
Trait Models to these data. The goal is to find an effective method that
helps to gain more insight into the data structure by visualising it in
2D latent space in order to help a geochemical consulting company with
the analysis of their data.
The seminar will give an overview of geochemistry and the work in this
field using the example of IGI Ltd, an introduction to GTM (Generative
Topographic Mapping), which is an probabilistic alternative to Self
Organizing Maps, as well as an introduction to the EM algorithm for GTM
and how it can be used to account for the missing value problem.
Zeit: Freitag, den 27. April 2007, 12 Uhr c.t.
Ort:
Institut für Neuro- und Bioinformatik
Seminarraum (1. OG, Raum 17),
Ratzeburger Allee 160 (Geb. 64, 1. OG)

