<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>100 Ideas &#187; Garage Metagenomics</title>
	<atom:link href="http://has100ideas.com/idea/category/diybio/garage-metagenomics/feed" rel="self" type="application/rss+xml" />
	<link>http://has100ideas.com</link>
	<description>At least one each year</description>
	<lastBuildDate>Tue, 20 Jul 2010 22:14:55 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Building an example BioWeatherMap dataset</title>
		<link>http://has100ideas.com/idea/building-an-example-bioweathermap-dataset</link>
		<comments>http://has100ideas.com/idea/building-an-example-bioweathermap-dataset#comments</comments>
		<pubDate>Wed, 27 May 2009 04:01:35 +0000</pubDate>
		<dc:creator>Mac</dc:creator>
				<category><![CDATA[DIYbio]]></category>
		<category><![CDATA[Garage Metagenomics]]></category>
		<category><![CDATA[bioweathermap]]></category>
		<category><![CDATA[cartogram]]></category>
		<category><![CDATA[metagenomics]]></category>
		<category><![CDATA[visualization]]></category>

		<guid isPermaLink="false">http://has100ideas.com/?p=58</guid>
		<description><![CDATA[I want to make an example &#8220;BioWeatherMap&#8221; based on existing metagenomic data, so I&#8217;m looking for a dataset of 16s ribosomal dna (rDNA) sampled from tens or hundreds of environmental locations. First I&#8217;d start by creating a simple map rom the basic data (sequences + location) &#8211; something like the map below that was made [...]]]></description>
			<content:encoded><![CDATA[<p>I want to make an example &#8220;<a href="http://bioweathermap.org/">BioWeatherMap</a>&#8221; based on existing metagenomic data, so I&#8217;m looking for a dataset of 16s ribosomal dna (rDNA) sampled from tens or hundreds of environmental locations.  First I&#8217;d start by creating a simple map rom the basic data (sequences + location) &#8211; something like the map below that was made for a <a href="http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0040368">marine virome</a> project.</p>

<div id="attachment_57" class="wp-caption alignright" style="width: 310px"><a href="http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0040368"><img class="size-medium wp-image-57" title="The Marine Viromes of Four Oceanic Regions" src="http://has100ideas.com/wp-content/uploads/2009/05/journalpbio0040368g001-300x209.png" alt="The Marine Viromes of Four Oceanic Regions" width="300" height="209" /></a><p class="wp-caption-text">sampling sites from The Marine Viromes of Four Oceanic Regions - we can do a better job of mapping than this!</p></div>

<p>Then I&#8217;d like to do something a little more abstract.  The first thing that comes to mind is distorting the map so that one of the dimensions of the data, such as read density (base pairs/ km^sq), is constant for each pixel.  This would expand the map in areas that had been sampled and shrink it in areas that were not sampled.  I actually think read density is a pretty boring thing to graph (at least until there are tens of thousands of samples), but I think this technique would be a neat way to represent something like a <em>diversity metric</em> for each sample.  These kind of visualizations are called &#8220;<a href="http://en.wikipedia.org/wiki/Cartogram">cartograms</a>.&#8221;</p>

<p>(Note: these are just some basic ideas.  It might be the case that all density representations, such as heatmaps or the distortions I mentioned above, are inappropriate for representing sparse samples across a large area.  Nonetheless, the first step is to get some data and start experimenting.)</p>

<p><a href="http://www.pnas.org/content/101/20/7499.full"><img src="http://has100ideas.com/wp-content/uploads/2009/05/f5large-787x1024.jpg" alt="state newsworthiness cartogram" title="state newsworthiness cartogram" width="600" class="aligncenter size-large wp-image-86" /></a></p>

<p>In this cartogram &#8220;the sizes of states are proportional to the frequency of their appearance in news stories.&#8221;  From <a href="http://www.pnas.org/content/101/20/7499.full">Diffusion-based method for producing density-equalizing maps</a> by Michael T. Gastner and M. E. J. Newman.<br /></p>

<h3>a tab-delimited example of a <strong>basic</strong> metagenomic dataset for constructing a map:</h3>

<pre><code>sample_id   lat lon sequence_id 16s_sequence    suspected_species
000001  32.131341   98.231332   0001    agcctagcacgga...    Bacillus subtillis
000001  32.131341   98.231332   0002    agcgtaggttgac...    Acinetobacter baylyi
</code></pre>

<p>I would be happy just with 10,000 entries in a single text file in a format similar to the one above (but note that lat/lon are identical for all sequences in a given sample).  It would be even better if there were more dimensions of data.  Here are some other potential columns:</p>

<ul>
<li>taxonomy (calculate a <em>diversity metric</em> from each sample from this?  what else can we do with a taxonomy?)</li>
<li>pathogenicity</li>
<li>auto- or heterotrophic</li>
<li>other metabolic information?</li>
<li>GO terms, or something like them at an organismal level</li>
<li>URL canonical species description in ncbi</li>
<li>?  Please make suggestions in the comments.</li>
</ul>

<p>Synthesizing such a dataset (as a large plaintext file or as a database) will require aggregating a variety of other datasets.  I have no idea where to begin with them.  If I know a particular species (Acinetobacer baylyi, for instance), is there a single entry point for deriving all this information in NCBI?</p>

<p><br /></p>

<h3>existing metagenomics datasets</h3>

<p>I spent a couple of hours reading metagenomic papers and browsing around for datasets.  Here&#8217;s a quick list of interesting resources.  My naive first look didn&#8217;t turn up anything similar to the basic plaintext example above.</p>

<p><a href="http://www-ab.informatik.uni-tuebingen.de/software/megan/welcome.html#example-datasets">MEGAN &#8211; Metagenome Analysis Software &amp; sample data</a></p>

<p><a href="http://www.biomedcentral.com/1471-2105/10/S1/S12">Methods for comparative metagenomics (introducing MEGAN)</a> (paper)</p>

<p><a href="http://bmf2.colorado.edu/unifrac/tutorial.psp">UniFrac software &amp; sample data</a> (Look for the datasets they used to construct the phylogeny trees)</p>

<p><a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;pubmedid=16893466">UniFrac paper</a> (paper)</p>

<p><a href="http://www.plosbiology.org/article/info%3Adoi%2F10.1371%2Fjournal.pbio.0040368">The Marine Viromes of Four Oceanic Regions</a> (paper)</p>

<p><a href="http://scums.sdsu.edu/index.php">Data from Marine Viromes study (and more!)</a></p>

<p><a href="http://www.ncbi.nlm.nih.gov/books/bv.fcgi?rid=metagenomics.chapter.soil#soil.Data">ncbi metagenomics book soil chapter (Waseca County Farm Soil)</a></p>

<p><a href="http://www.ncbi.nlm.nih.gov/sites/entrez?holding=&amp;db=nucleotide&amp;cmd=search&amp;term=AY921654%3AAY922179%5Baccn%5D">16s rDNA identified from Waseca County Farm Soil dataset (in ncbi&#8217;s nt database)</a></p>

<p><a href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pubmed&amp;pubmedid=17355175">CAMERA: A Community Resource for Metagenomics</a> (their webview of the datafiles looks interesting; the datafiles themselves are just fasta)</p>

<p><a href="http://web.camera.calit2.net/cameraweb/gwt/org.jcvi.camera.web.gwt.download.BrowseProjectsPage/BrowseProjectsPage.oa?projectSymbol=CAM_PROJ_GeneExpression#ViewProject">CAMERA: Surface Water Marine Microbial Community Gene Expression project</a></p>

<h3>inspirational infographics:</h3>

<p><a href="http://www.visualcomplexity.com/vc/project.cfm?id=331"><img class="alignright size-medium wp-image-81" title="Travel-time maps by Chris Lightfoot, Tom Steinberg" src="http://has100ideas.com/wp-content/uploads/2009/05/331_big01jpg-300x225.jpg" alt="Travel-time maps by Chris Lightfoot, Tom Steinberg" width="300" height="225" /></a></p>

<p><a href="http://www.visualcomplexity.com/vc/project.cfm?id=331">Travel-time Maps</a> by Chris Lightfoot &amp; Tom Steinberg</p>

<p><a href="http://www.visualcomplexity.com/vc/project.cfm?id=512"><img class="alignright size-medium wp-image-82" title="[Center for Mathematical Modeling infographics set by Juan Pablo De Gregorio" src="http://has100ideas.com/wp-content/uploads/2009/05/512_big02jpg-300x225.jpg" alt="[Center for Mathematical Modeling infographics set by Juan Pablo De Gregorio" width="300" height="225" /></a></p>

<p><a href="http://www.visualcomplexity.com/vc/project.cfm?id=512">Center for Mathematical Modeling infographics</a> by Juan Pablo De Gregorio</p>
]]></content:encoded>
			<wfw:commentRss>http://has100ideas.com/idea/building-an-example-bioweathermap-dataset/feed</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Problem: sequencing a heterologous mix of 16s PCR products</title>
		<link>http://has100ideas.com/idea/sequencing-heterologous-mix-of-16s-pcr-products</link>
		<comments>http://has100ideas.com/idea/sequencing-heterologous-mix-of-16s-pcr-products#comments</comments>
		<pubDate>Mon, 25 May 2009 23:51:20 +0000</pubDate>
		<dc:creator>Mac</dc:creator>
				<category><![CDATA[Garage Metagenomics]]></category>

		<guid isPermaLink="false">http://has100ideas.com/?p=49</guid>
		<description><![CDATA[Possible solution: Use gel electrophoresis to separate the DNAs. Temperature Gradient Gel Electrophoresis (TGGE) and Denaturing Gradient Gel Electrophoresis (DGGE) DGGE of small ribosomal subunit coding genes was first described by Gerard Muyzer, while he was Post-doc at Leiden University, and has become a widely used technique in microbial ecology. PCR amplification of DNA extracted [...]]]></description>
			<content:encoded><![CDATA[<p>Possible solution: Use gel electrophoresis to separate the DNAs.</p>

<ul>
<li><a href="http://en.wikipedia.org/wiki/Temperature_gradient_gel_electrophoresis">Temperature Gradient Gel Electrophoresis (TGGE) and Denaturing Gradient Gel Electrophoresis (DGGE)</a></li>
</ul>

<blockquote>
  <p>DGGE of small ribosomal subunit coding genes was first described by Gerard Muyzer, while he was Post-doc at Leiden University, and has become a widely used technique in microbial ecology. PCR amplification of DNA extracted from mixed microbial communities with PCR primers specific for 16S rRNA gene fragments of Bacteria and Archaea, and 18S rRNA gene fragments of Eukaryotes results in mixtures of PCR products. <strong>Because these amplicons all have the same length, they cannot be separated from each other by agarose gel electrophoresis. However, sequence variations (i.e. differences in GC content and distribution) between different microbial rRNAs result in different denaturation properties of these DNA molecules.</strong> Hence, DGGE banding patterns can be used to visualize variations in microbial genetic diversity and provide a rough estimate of the richness and abundance of predominant microbial community members.</p>
</blockquote>

<p>Solid-phase PCR (Polony PCR) is also intriguing: Dilute and spread the DNAs across a surface and use solid-phase PCR to amplify individual DNAs into polonies.  Pick polonies and have them sequenced.</p>

<ul>
<li><a href="http://nar.oxfordjournals.org/cgi/content/abstract/27/24/e34">In situ localized amplification and contact replication of many individual DNA molecules</a></li>
</ul>

<blockquote>
  <p>&#8220;We describe a method to clone and amplify DNA by performing the polymerase chain reaction (PCR) in a thin polyacrylamide film poured on a glass microscope slide. <strong>The polyacrylamide matrix retards the diffusion of the linear DNA molecules so that the amplification products remain localized near their respective templates. At the end of the reaction, a number of PCR colonies, or &#8216;polonies&#8217;, have formed, each one grown from a single template molecule.</strong> As many as 5 million clones can be amplified in parallel on a single slide. If an Acrydite modification is included at the 5[prime] end of one of the primers, the amplified DNA will be covalently attached to the polyacrylamide matrix, allowing further enzymatic manipulations to be performed on all clones simul-taneously. We describe techniques to make replicas of these polony slides, and high throughput sequencing protocols for this technology.&#8221;</p>
</blockquote>

<ul>
<li>More info: <a href="http://nar.oxfordjournals.org/cgi/content/full/28/20/e87">Solid phase DNA amplification: characterisation of primer attachment and amplification mechanisms</a>
<br /></li>
</ul>

<p>Lastly, I could just give up on the idea of a heterologous PCR product and instead design primers specific to a particular rDNA or other gene of interest.  Instead of exploring the space of all rDNAs present in the sample&#8217;s metagenome, I would be testing for the presence of a particular DNA.  Much less exciting, if you ask me.</p>
]]></content:encoded>
			<wfw:commentRss>http://has100ideas.com/idea/sequencing-heterologous-mix-of-16s-pcr-products/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Garage Metagenomics</title>
		<link>http://has100ideas.com/idea/garage-metagenomics</link>
		<comments>http://has100ideas.com/idea/garage-metagenomics#comments</comments>
		<pubDate>Tue, 19 May 2009 21:22:46 +0000</pubDate>
		<dc:creator>Mac</dc:creator>
				<category><![CDATA[DIYbio]]></category>
		<category><![CDATA[Garage Metagenomics]]></category>
		<category><![CDATA[bioweathermap]]></category>
		<category><![CDATA[DNA barcoding]]></category>
		<category><![CDATA[metagenomics]]></category>
		<category><![CDATA[sequencing]]></category>

		<guid isPermaLink="false">http://has100ideas.com/?p=33</guid>
		<description><![CDATA[I'm really excited about the BioWeatherMap project and can't wait for the first FlashLab event in Boston.  In the meantime, I want to figure out how to do a smaller-scale version of the same thing - direct sequencing of a short, species-specific genomic DNA sequence for microbial species identification - in a cheap, garage-friendly way.]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://bioweathermap.org/">BioWeatherMap project</a>, led by <a href="http://diybio.org">DIYbio</a> member <a href="http://thepersonalgenome.com/">Jason Bobe</a>, is developing a ~$5 kit that will let users identify thousands of the microbes present in or on a given sample or surface.  The kit helps the user prepare an arbitrary sample for direct DNA sequencing.</p>

<p>In the pilot program, hundreds of participants will sign up at <a href="bioweathermap.org">bioweathermap.org</a>, gather in a &#8220;FlashLab&#8221; event, receive a sampling kit (probably something like a Q-tip and a test tube), and disperse throughout the local region to sample a particular type object, such as cross-walk buttons.  The participants will then return their samples to the BioWeatherMap group for analysis.</p>

<p>Once several hundred samples have been collected and prepared, the BioWeatherMap group will purchase a single DNA sequencing run on a high-throughput DNA sequencer for all the samples at once, with each sample receiving about 1500 &#8220;reads&#8221;.  The preparation step isolates and amplifies a small, species-specific region of DNA from the genome of each sample.  For each sample, then, up to 1500 unique species could be identified (or the same specie could be identified 1500 times).  By leveraging economies of scale, the BioWeatherMap group will be able to provide 1500 reads at the most economical cost.</p>

<p>The BioWeatherMap project is the first example of a new kind of distributed, participatory, bite-size science.  It will demonstrate to participants that research doesn&#8217;t require a Ph.D.  The data they generate will be fascinating &#8211; literal BioWeatherMaps: maps of time-series of microbial population flows overlaid on a Google Earth view of the city.</p>

<p>The first Flashlab event should be happening sometime this summer (2009).</p>

<p>In the meantime, I want to figure out how to do a smaller-scale version of the same thing &#8211; direct sequencing of a short, species-specific genomic DNA sequence for microbial species identification &#8211; in a cheap, garage-friendly way.  I want to do Garage <a href="http://en.wikipedia.org/wiki/Metagenomics">Metagenomics</a>.  (<a href="http://en.wikipedia.org/wiki/DNA_barcoding">DNA Barcoding</a> is the same concept but for Eukaryotic organisms).</p>

<p>I&#8217;ve started looking through the <a href="http://bit.ly/15UQwQ">metagenomics literature</a>(links to fulltext library) for simple protocols that could be adapted to a basic garage lab.  I&#8217;m planning on outsourcing the actual sequencing ($50-$100?), but doing the rest of the sample preparation myself: isolating and purifying genomic DNA and doing PCR to amplify the species-specific DNA barcode, probably a 16s or 18s ribosomal subunit gene.</p>

<p>Check back soon for more info on my progress and please leave a comment if you would like to help out or do some Garage Metagenomics of your own.</p>
]]></content:encoded>
			<wfw:commentRss>http://has100ideas.com/idea/garage-metagenomics/feed</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>
