How Machine Learning should be applied to Neurological Disease Research

Jack Glendenning
11 min readJan 18, 2017

--

Currently neuroscience research doesn’t have large enough datasets for Machine Learning. In order to learn as much as we can about these neurological conditions, we need larger datasets.

This article in a nutshell:

Machine Learning is a data analysis tool and neuroscience currently doesn’t have the right datasets to get the most out of Machine Learning. We need millions of people to join a lifelong program of getting regular MRI scans and blood tests. Because of the sheer of people, a fraction of people within the program will get a neurological condition or disease. From this, we would be able to create a dataset that provides opportunities for supervised and unsupervised machine learning to take place. This would drastically improve early detection and allow us to analyse the evolution of the brain for a variety of diseases… but we don’t have the right datasets.

Contents

  • An introduction
  • Why Machine Learning is the best tool?
  • Supervised Learning and how it can be applied
  • Unsupervised Learning and how it can be applied
  • The Ideal Dataset
  • 2017 Project Proposal

An Introduction

I’ve been learning both neuroscience and Machine Learning in my free time for a while now because I’m just a curious person. I can just see how they fit together.

This is really heavy, and personal, but need to share it before I continue along with the Machine Learning in this article, and that is that my close friend’s mum has Lyme’s disease. This event has been enough for me to trigger actually doing something, write this article and come up with a solution to how we should be tacking neurological disease research.

From this I’ve learnt that Neurological disorders are both unfair and currently unpredictable (they don’t have to be unpredictable with Machine Learning). These two videos are so painful for me to watch, but they demonstrate the raw seriousness of neurological conditions and how important it is to solve them. If you’d like donate to Rachelle I’m sure it wouldn’t be anything far from greatly appreciated !❤!: https://www.gofundme.com/TheColourLyme

Why Machine Learning is the best tool?

In summary, this needs to be minimised:

Computers can now analyse data much better than people, thanks to Machine Learning, which can be used to identify the patterns and trends in amongst lots of data.

While it may appear that many people ‘suddenly’ get a neurological disease, many neurological diseases are progressive, meaning that the brain leads up to it’s degenerative state over time. Sure enough in Alzheimer’s research we’ve learnt that Alzheimer’s victims generate amyloid plaques and neurofibrillary tangles in the early stages of the Alzheimer’s, but for all neurological conditions, instead of analysing the end results we need to learn more about how the brain develops progressively develops to these conditions. This is my take on why we need computers and Machine Learning:

The earlier that we (people) go back into a victim’s brain, the harder it is for us to visually find the signs that the disease is progressing. We are limited to making relationships in data on few images at a time. Where we will struggle to see the early signs, computers are able to perform analyse on large groups of data and form relationships about these signs.

Now onto more of the Machine Learning. What it is, and what are some promising results that Machine Learning offers for neuroscience research.

Supervised Learning and How it can be Applied

Jargon (fancy words) and non-simple explanations = yuck. Too many sources overcomplicate machine learning, mainly because it’s belonged to those with PhD’s and academics. Explaining things in simple ways is more valuable to everyone.

To display what features of Machine Learning can be used for neuroscience, I will give explain a classification problem that computers can now solve with supervised learning:

The relationship for Dog and Cat are coloured differently to demonstrate the computer generated a unique relationship

What we told the computer was:

“Hey, this collection of images and they are dogs. This is another other collection of images and they are cats”.

What the computer did was:

Create a relationship for each collection. It did this by analysing the images in the collection and seeing how they all relate to each other with techniques computers are capable of. It then associates this relationship with the output we gave it.

This is in a genre of machine learning called supervised learning because we tell the computer what the output of the collection of data is (if we gave it a bunch of images without putting them in a collection or telling it what they were, that would be unsupervised learning (I’ll get to that fairly soon). We can use this knowledge the computer has to identify an unseen image:

What happened there was that the computer ran an analysis all the relationships it had on the image and it went:

“Hey, this matches the Dog relationship the most, so that image is most likely a dog”.

This is how simple Machine Learning can be. It’s a very powerful data analytics tool, but the thing is that it only works with more data. The more images of dogs and cats that computer can train on, the more accurately the computer can predict an unseen image. Also, if we wanted a computer to be able to tell the difference between ‘Dogs, Cats and Frogs’, we could simply give the computer another collection of images and tell it they were frogs.

Classification can be used to predict whether or not someone will develop a neurological condition in their lifetime. But this is the kind of dataset that needs to be created:

What we need is people to join a program of regularly getting of MRI scans. Depending on if that person gets a neurological condition or not within their lifetime, the early MRI images of that person can be labeled as either ‘Does get a neurological condition’ or ‘Does not get a neurological condition’. The computer can be trained in a similar way to the ‘dog or cat’ example except the targets will be ‘does get a neurological condition’ or ‘does not get a neurological condition’.

We can use the learning from this dataset on an unseen image to see if there was a relationship that it matched.

This is one very simplistic, but real example of how Machine Learning could be applied to such a dataset. Of coarse, the input collection would actually be collections of the entire 3D MRI scans instead of just a 2D images.

Supervised Learning and How it can be Applied

Unsupervised Learning is much like supervised learning, but… instead of giving the computer’s a collection of input data and label it, we just throw the computer data. From this input data, the computer draws relationships from the data.

A very common application of unsupervised learning is known as clustering. Below is an example:

source: https://de.wikipedia.org/wiki/Clusteranalyse

In this example, the computer was given coloured data points scattered around on a plot. From the data given to it, it was able to categorise the data into 3 categories and find the best fitting relationship for these categories.

There’s only so much I can convey in text, so if you have the time to watch this video, you’ll understand more about some of the unique applications of unsupervised learning.

For neuroscience research, unsupervised learning has a lot of potential. It will be able to find relationships amongst the data that we may be unable to recognise ourselves. If we gave the computer all the regular MRI scans from those categorised as “does get a neurological condition”, the computer could be able to cluster the areas and growth factors that would have lead to that condition. The same could be done for the “does not get a neurological condition”, to see the growth that the brain experienced.

The Ideal Dataset

MRI, Blood Tests, Brain Performance Tests.

Datasets in Neuroscience now

Datasets for Machine Learning what a fishing rod is for a fisherman. You really need the right resources to get the right results.In current neuroscience research, the datasets could be much larger and more comprehensive.

With a quick google search, I was able to find a list on Wikipedia that contained links to publicly available datasets:

Upon browsing many of the datasets, I came to notice that many of them had fewer than 5000 candidates involved in their datasets. The Alzheimer’s Disease Neuroimaging Initiative (ADNI) had a total cohort of 1000 people ranging from ages 55 and above.

In order draw complex relationships and create the best datasets, we need much more people. These people should be aged 18 upon the start of the program and have their data collected once every 6 months throughout their lives.

MRI

There is zero radiation used in MRI which makes it one of the safest imaging techniques used in medicine. It is also one of the most powerful imaging techniques available to us.

The reason that I recommended collecting data regularly at 6 month intervals was because we do not know the effect that regular MRI scans could have on people. But I say regular scans because when the brain condition occurs, you’ve practically got video footage of a person’s brain leading up to the event.

The only real dangerous part about MRI is the use of contrast dyes. The most widely used contrast dye is Gadolinium. Gadolinium is injected into the vein of your arm to spread to all corners of circulatory system including your brain. When Gadolinium is used, the picture will show up much better. Gadolinium is toxic, but when paired with chelating agents, gadolinium’s toxicity is suppressed whilst allowing it to still clearly display MRI scan.

In a life long program of regular MRI scans, I believe that the use of Gadolinium could be too dangerous, therefore it should not be used.

Blood Tests

There is a lot of information that can be gained from a small sample of blood.

  • Map someone’s genome
  • DNA and RNA samples
  • Check balance of vitamins, minerals, irons etc.
  • Blood Type

The more variables that we can put into the equation for brain conditions, the easier it will be for us to identify the correct relationships that that cause the diseases to occur (that’s why I’ve included the idea of blood tests).

Blood tests are simple to perform and receive a lot of data about the the levels of chemicals in the body. Blood taken from a blood test is regenerated by the body within an hour, and the blood cells are re-created within a few days. This means that it would be possible to collect data MRI and blood test data on a weekly basis for candidates in the lifelong program.

Note: I’m not saying that it needs to be done on a weekly basis, but in reality, the closer together the testing is performed, the higher the frame rate in the neurological disease evolution movie that we create for the candidates.

Brain Performance

Lumosity happens to be the largest collection of cognitive performance data in neuroscience. If each of the program participants joined Lumosity, their cognitive performance could be tracked through their life and added to their datasets.

In addition to Lumosity, Electroencephalography (the analysis of brain waves via electrodes placed on the head) would provide valuable insights on a large scale.

How regularly should data be collected?

We do not know what the result of regular MRI’s may cause to a person, it is safest to perform the collection of data with MRI scans wider apart. Over many years, if it becomes realised that MRI scans are safe to perform within short intervals, MRI scans will be able to be performed closer together.

In my honest opinion, if we can crack portable MRI’s and make MRI machine’s publicly available, I believe that MRI scans will eventually become a quick daily habit performed as often as cleaning teeth.

There never has been a large dataset collected of brain images, blood tests and brain activity collected on a large scale for a number of participants.

By the time the program is finished, computer processing speed and Machine Learning practices will have continued to improve to allow high detailed analysis to occur on the collected dataset.

2017 Project Proposal

I’m going to go out and actually try to execute something like this this year. As of last year, I’ve started attending some seminars and events at the Queensland Brain Institute at the University of Queensland Campus (all the seminars are free and amazing more info here). They run a range of incredible research projects. As of this week, I’m in the process of just trying to influence research teams to seek larger sources of data, so that quality Machine Learning can take place.

Our generation needs to contribute more than ever to the world around us to solve many of the issues that exist (environmental, medical, and pretty much everything).

Conclusion

The most important take away of this article is that we need much more datasets.

Some of the best regularly data could be brain imaging, blood analysis and brain activity monitoring. There may be other great sources of data.

It’s important to realise that computers can recognise patterns much better than we can now, and we need to take advantage of that to solve problems. The more data that the computer has, the relationship can be made more comprehensively.

We could have all of the right techniques and tools to solve the disease, but if we don’t have the right data that holds information about the diseases, then it will be difficult for us to solve them.

If you’d like to see more articles from me like these in the future, you can support me by clicking the recommend button. Articles like these take a fair amount of effort to produce, but I can do more if there’s interest and support for them.

--

--

Jack Glendenning
Jack Glendenning

Written by Jack Glendenning

Computer Science student at QUT. Here on medium to share things that give value

No responses yet