BLUP: Brain Learning Unicorn Project

Click HERE to see this readme as a website.

Can a model predict the genetic profile of an individual based on brain regions volumes?

Trying to deal with new stuff learned at the amazing Brainhack school

Author: Elise Alix Douard

Other project contributors: BHS, Hannah Kiesow @hannahmaykiesow (also working on UKBiobank), Kuldeep Kumar @meetkd007 (extracting data from UKBB servers)

Personal presentation

Welcome to this project dear unicorn student !

Source: https://media.giphy.com/

I am Elise, Ph.D. student in neurosciences at the UdeM since near to 4 years, and working on the contribution of genetic to neurodevelopmental disorders as autism. I don’t really fit in a specific domain (genetic/cognitive neurosciences/psychology). I guess it is what we call a unicorn student? Currently, I am working with genetic data (Copy Number Variants), clinical phenotypes and doing a lot of statistics and graphs on R. But my initial formation was in cognitive neurosciences where I started to work with multimodal data (combination of Arterial Spin Labelling MRI data and Eye-tracking data).

Since I started my Ph.D., I never used MRI data nor python, and I am here to take a revenge on that.


Project definition


Source: Illustration inspired from freepik.com content and adapted on adobe illustrator

Copy number variants (CNVs) are a family of structural variation of the chromosomes. They can be either a loss or a gain of a chromosome portion in comparison to a genome of reference. Sometimes, CNVs can be pathogenic, meaning that they are formally associated to neurodevelopmental or psychiatric disorders, such as autism spectrum disorders (ASD), Schizophrenia (SZ) or intellectual disability (ID). Such pathogenic CNVs have been associated to significant alterations of brain volume (Modenato et al., 2020 ; Martin-Brevet et al., 2018 ; Maillard et al., 2015) or connectivity (Moreau et al., 2019). Notably, there were common alterations of the insula volume when comparing structural brain alterations due to pathogenic CNVs and due to a neurodevelopmental disorder (e.g. ASD or SZ) (Cauda et al., 2017 ; Goodkind et al., 2015).


Can a model predict the genetic profile of an individual based on brain regions volumes?


This project aims to feed a machine learning model with brain volumes to predict if an individual is carrier of a potentially pathogenic CNV.

Source: Illustration inspired from freepik.com content and adapted on adobe illustrator



Raw data

For this project, 35,759 individuals from UK Biobank with genetic and derived brain volume data were available.

Data transformation

For all individuals, the 68 region volumes were adjusted for potential confounder effects.

Table 1: Description of the confounders

Goup N Mean age (sd) Mean TIV (sd) N Female N Male N Site 1 N Site 2 N Site 3
Carriers 1265 63.8 (7.4) 1540824.3 (150493.6) 671 594 781 320 164
Controls 34494 64.1 (7.6) 1549091.7 (151512.6) 18280 16214 21411 8607 4476

Final dataset: training and test sets

A subgoal of the project was to make minimal changes to the features used in the machine learning models. The final volumes used as features were the ones adjusted for the confounder effects without z-scoring. Another subgoal was to deal with imbalanced dataset which reflect the reality of carriers prevalence in the general population. As an alternative, the imbalance was reduced by pseudo-randomly resampling the controls. The final dataset included the 1,265 carriers and twice more controls.

Click on the following images to open interactive pie-charts:

Figure 1: Proportion of controls and carriers in the training and test sets used for the machine learning models.


Machine learning model used

Step followed for the three machine learning models

Tools (used and learned)

The project rely on the following technologies:


In this GitHub repository:

Week 3 deliverable: data visualization

link to the blog created for the assignment: https://elise-douard.github.io/EliseAD_BLUP_BlogPage/


Results of the models

Figure 2: Results after testing the models (step 5)


As you can see in the Figure 2, none of the models were better than chance to classify the carriers and the controls. BUT it is not a problem because… I learned a lot doing this project!

Progress overview

Week one:

Week two:

Week three:

Week four:

Week five:

Conclusion and acknowledgement

This BHS project allows me to learn a lot of concepts and tools concerning the open science. It was also a nice introduction to machine learning models. Hopefully, I will spread the word and I will surely include all these new tools in my practice. I am more than grateful toward all the instructors, mentors and students, who shared their knowledge.

Thanks for this enriching experiment!

Source: https://media.giphy.com/


Cauda F. et al., “Are Schizophrenia, Autistic, and Obsessive Spectrum Disorders Dissociable on the Basis of Neuroimaging Morphological Findings?: A Voxel-Based Meta-Analysis.” Autism Research 10, no. 6 (2017): 1079–95. https://doi.org/10.1002/aur.1759.

Goodkind M. et al., “Identification of a Common Neurobiological Substrate for Mental Illness.” JAMA Psychiatry 72, no. 4 (April 2015): 305–15. https://doi.org/10.1001/jamapsychiatry.2014.2206.

Kendall K. M. et al., “Cognitive Performance and Functional Outcomes of Carriers of Pathogenic Copy Number Variants: Analysis of the UK Biobank.” The British Journal of Psychiatry 214, no. 5 (May 2019): 297–304. https://doi.org/10.1192/bjp.2018.301.

Maillard A. M. et al., “The 16p11.2 Locus Modulates Brain Structures Common to Autism, Schizophrenia and Obesity.” Molecular Psychiatry 20, no. 1 (February 2015): 140–47. https://doi.org/10.1038/mp.2014.145.

Martin-Brevet S. et al., “Quantifying the Effects of 16p11.2 Copy Number Variants on Brain Structure: A Multisite Genetic-First Study.” Biological Psychiatry, 84, no. 4 (2018): 253–64. https://doi.org/10.1016/j.biopsych.2018.02.1176.

Modenato C. et al., “Neuropsychiatric Copy Number Variants Exert Shared Effects on Human Brain Structure.” MedRxiv, April 17, 2020, 2020.04.15.20056531. https://doi.org/10.1101/2020.04.15.20056531.

Moreau C. et al., “Neuropsychiatric Mutations Delineate Functional Brain Connectivity Dimensions Contributing to Autism and Schizophrenia.” BioRxiv, (2019), 862615. https://doi.org/10.1101/862615.