Workpackage 9: ALS Model Generation




To achieve a computational infrastructure for the multiple data modalities


To perform analysis of rodent/cellular and human molecular networks


To build and replicate an integrative ALS model for disease and disease progression


Task 1. To achieve a computational infrastructure for the multiple data modalities

To establish a submission standard and storage method for data produced by the -omics WPs, the data will undergo initial processing by the group that is generating each data modality. The group will be designated as Data Intergration Panel (DIP). The different workpackages that will contribute data each have a representative that is also a member of WP9. These representatives take responsibility for preprocessing and quality control (QC) of the individual datasets, will ensure that data will be in a well defined standard format and will deposit this data in the XGAP infrastructure. They will also be key in providing the biological hypotheses that underly how the different datasets will be integrated. WP9 leader Groningen will concentrate on developing the XGAP infrastructure and statistical methods for the integration of the data, including QTL mapping and causal inference. Every 6 months all WP9 members will meet to discuss progress and how to conduct the -omics integration. The result of this initial processing will be converted into a standard data submission format to support consistent data storage, integrated analysis, and dissemination. These formats will be based on formats developed in other large-scale data production projects such as PANACEA, GEN2PHEN and SYSGENET, where several investigators of the current proposal play a key role. This will allow to leverage the analysis methods being developed within those projects in the most efficient possible way.

At the start of the project, we will establish a data integration panel (DIP) representing each of the computational partners of the different -omics WPs that will be chaired by Ritsert Jansen (Groningen). The DIP will be responsive to input, suggestions, and analysis priorities set by the consortium as a whole and will guide the flow of data from the production groups in Pillar 2 through the data integration tasks described in this WP.

We will build upon the experience of the software as already implemented on www.xgap.org (XGAP) and described in detail here (Swertz and Jansen, 2007). In brief, it includes functionality for data quality control, QTL/association analysis, network and causal inference analysis and storage and visualization. High-performance computing on large computer clusters is available to us, including the supercomputer infrastructure in Groningen as well as the Blue Gene IBM supercomputer.

Task 2. To perform analysis of rodent/cellular and human molecular networks

These analyses can be divided into intralevel and interlevel -omics analyses. The intralevel analyses include the respective detection of cis- and trans-eQTLs, pQTLs, mQTLs in ALS patients and controls. In addition, the intralevel analyses include the detection of co-expression networks in the different -omics dataset, and comparing differences in the topology characteristics between ALS patients and controls and identifying major hubs that are associated with disease. This is comparable to a recent GWA study in celiac disease where we identified 40 loci. For 20 of these loci, the most significantly associated SNP was also affecting peripheral blood gene expression levels. Subsequent co-expression analysis in an independent dataset (comprising 33,109 expression arrays) revealed that many of these genes were co-expressed and form a highly connected core of immune-related genes (Dubois et al, in press). Starting with the identified genetic variants in ALS and the -omics data that will be generated in ALS relevenant tissues, we aim to use the same procedure. Correlations among traits from different levels can be used to generate hypotheses about network connections in interlevel analyses and are the starting point of objective 3 below.

The interlevel analyses provide an advantage through the combined signal of small effects. As shown previously regulatory hot spot effects may pass level-wise and system-wide significance thresholds but these effects may also reach system-wide significance without passing the level-wise significance threshold (Fu et al, 2009).  In addition, we will incorporate techniques to further reduce inherent noise in the -omics data based on principal component analysis, as described recently by us (Dubois et al, Nat Genet 2010).

Task 3. To build and replicate an integrative ALS model for disease and disease progression

Systemwide QTL analyses allow for the reconstruction of molecular pathways that are relevant for the phenotype. Having identified the eQTLs, pQTLs, mQTLs in relation to ALS and ALS progression, advanced causal reasoning will be applied, including regression based analysis using the correlation between error terms of different models and the regression of residuals on QTLs of different models (Jansen et al, 2009). In addition we will study the emerging properties of these networks by computer simulation. Parametric modelling of the causal connections in networks allows for computer-based study of the emerging properties of the system, e.g. by in-silico perturbation of parameters by a given amount to investigate the robustness or fragility of (points in) the network. We will use generalized linear models (Figure 5, Objective 3), permitting us to model the relationships that exist between the various -omics levels and the environmental factors, beyond the classical identity link function.