Sometimes you read science fiction about people doing homebrewed genetic research (John Brunner one author comes to mind), and the good as well evil that are produced.  But think of it, what if you could have access to the same tools as biologists?  Would that give you the opportunity to solve a large world problem?

What kind of things could you research?  Cures for Multiple Sclerosis?  Vaccines grown using human produced mediums?

Guess what? The tools are here, and they are open source, start your journey by downloading NodeXL from http://nodexl.codeplex.com/, and then install the Microsoft Research Biology Extension for Excel.  Once you get everything installed, you now have a powerful tool do perform genetic investigations in the comforts of your home.

Once you get all of that downloaded and installed, you are ready to start doing genetic research!  And keep in mind all of this code is open source using .NET, so you can use an advanced programming language to build interesting ways to process this information.

Once you get everything loaded, now you need a genome.  E-Coli is well researched and sequenced, after all mammals certainly produce a lot of e-coli.

Now go to the National Institute for Health (NIH) and get some data.  I randomly selected the sequence:

image

Once you get to the GenBankFTP, you will be able to download a version of the genome of this e-coli!

One more click, if you are following along then click the CP000880.gbk file or use this link:

ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria/Salmonella_enterica_arizonae_serovar_62_z4_z23__/CP000880.gbk

image

Once you have downloaded the CP000880.gbk file, open your Excel and click New, My Template:

image

Once the Excel workbook opens, the NodeGraphXL add-in may slow the opening of the workbook, you will be able to import the genome sequence into your Excel spreadsheet by using the ribbon and click on the “Import From”, navigate to the folder where you downloaded the CP000880.gbk file

image

Once you do that you get the following Spreadsheet with sequence data (only part of the spreadsheet is shown):

image

 

Now you can go crazy with BioInformatics.  Design your own E-Coli sequence, add mouse sequences to plants, see what happens.

Fortunately, it is only simulations with real data.

Have fun.