gabeiscoding's comments

gabeiscoding | 6 months ago | on: Show HN: E-Paper Family 2 Day Calendar

I sunk a lot of time trying to get perfect 1-bit font rendering running under Docker (Debian Linux), but could not get it match the precision of the Mac default Arial font rendering.

I have a side-by-side rasterized image showing the difference here:

https://github.com/gaberudy/epaper-calendar/blob/main/docs/f...

Linux insists on doing some freetype font thickening, giving the output a random thick-line look. If anyone knows more tricks to disable this or influence the anti-alias font rendering behavior, let me know!

gabeiscoding | 3 years ago | on: Oxide and the Chamber of Mysteries [video]

I felt a loss when the On the Metal podcast series wrapped up. I figured these Oxide guys now have too much real work to do, they won’t be able to keep up this podcast, even if it is the best tech podcast I’ve listened to. Bryan is an unbeatable host. If you love retro-computing or just the history of our industry in general, you will walk away learning something from anything he talks about. His quick wit and deep tech knowledge keeps you totally engaged, and gives you the feeling of living vicariously in the silicon valley that was about getting your hands dirty and building great tech products.

But now I can look forward to their leap over to “social audio”. First Twitter spaces (where I would sometimes chime in live) and now the Discord-hosted On the Metal.

Most podcasts are sports commentary. These guys are full-contact in the game. I love it. Keep it up Bryan!

gabeiscoding | 6 years ago | on: Real-world dynamic programming: seam carving

Cool to see this popping up again. It always impresses if you haven't seen it before and is a cool algorithm to work through.

The original paper was discussed on slashdot and back at that time I was inspired to build a little GUI around an open source algorithm implementation to play with my Qt skills.

It allows you to shrink, expand and "mask out" regions you don't want touch etc.

Still available on Google Code archive:

https://code.google.com/archive/p/seam-carving-gui/

gabeiscoding | 7 years ago | on: Advanced techniques to implement fast hash tables

The author of this post wrote klib/khash, and uses it in his very popular cpu-intensive bioinformatics programs such as bwa (short read alignment against the human ref genome).

I've been learning rust recently. As a learning exercise, I compared the robin-hood hashing of std::collections::HashMap in rust to his klib/khash he mentions in this article , and then tried various hash functions to try and match his performance:

https://github.com/gaberudy/hash_test

No dice, his hash table is smaller and faster.

My next step is to try and implement his data structure and hashing functions directly in rust and see if I can get it to near-C performance...

gabeiscoding | 8 years ago | on: Astronaut’s DNA No Longer Matches His Identical Twin’s After Year Spent in Space

A much better source for those with an interest in the science is Chris Mason's slides from his recent talk[1] at genetics conference (AGBT) about this that he shared on twitter[2].

He's a great speaker, and a cool guy and tackles some of the most interesting (at least to hear about) science in genomics

[1] https://www.dropbox.com/s/sfg6rdmgxjwdpil/Mason_NEB_talk_AGB... [2] https://twitter.com/mason_lab/status/964151387687972864

gabeiscoding | 8 years ago | on: Content-aware image resize library

I wrote a GUI for another seam carving library back in 2009[1], and it looks like although in archive mode, you can still access the source as well as the windows / mac binaries. Just tested it and it still works!

Not as fancy as photoshop I'm sure, but does have the ability to paint a mask of regions to keep / remove to aid the algorithm and get the desired result. Multi-threaded too!

[1] https://code.google.com/archive/p/seam-carving-gui/

gabeiscoding | 10 years ago | on: Google Genomics: store, process, explore and share genomic data

Looks great, but I can't comment more as I haven't used it.

It looks to be solving the same problems as DNAnexus, Seven Bridges, BaseSpace etc as a way to wrap open source tools in more user-friendly ways.

But it's orchestrating the production of smaller set of data that still needs the next step of human interpretation, report writing, family-aware algorithms and most complex annotations (the problem space Golden Helix is in).

In other words, the automatable bits that is not the hard part that I mentioned in my blog post.

gabeiscoding | 10 years ago | on: Google Genomics: store, process, explore and share genomic data

While I think it's great to have Google putting their weight behind standardization efforts like Global Alliance for Genomic Health (GA4GH), I really don't get the need to replace VCF and BAM files with API calls.

Ultimately, the "hard part" about genomics is not big-data requiring Spanner and BigTable to get anything done. I actually wrote a blog post about this this week:

http://blog.goldenhelix.com/grudy/genomic-data-is-big-data-b...

Both BAM and VCF files can be hosted through a plain HTTP file-server and be meaningfully queried through their BAI/TBI indexes. Visualization tools like our GenomeBrowse or the Broad's IGV can already read S3 hosted genomic files directly without having an API layer and very efficiently (gzip compressed blocks of binary data). So, I see the translation of the exact same data into API-only accessible storage system, where I can't download the VCF and do quick and iterative analysis on it more of a downside that plus.

Disclaimer: I build variant interpretation software for NGS data at Golden Helix. Our customers are often small clinical labs who size of data and volume are not driving them to the cloud.

gabeiscoding | 10 years ago | on: When sequencing makes genotyping obsolete (soon)

They are correct in calling out Illumina's de-facto monopoly rents they are extracting on the market, but sadly I don't share their wildly optimistic view that we are eminent for technological disruption that will re-start the price plummeting of whole genome sequencing.

Nanopores are no where near the throughput and accuracy of Illumina's sequencing by synthesis tech, and if there is a pathway to challenge Illumina's position, it will be extremely complex, iterative and _long_.

Meanwhile Illumina is amassing a billion dollar war chest and is adding its own complex and iterative improvements to its platform (two-color detection, longer and longer reads, higher cluster density), maintaining its market lead.

As much as the analogy to microprocessor manufacturing and Moore's law is alluring, the messy stuff of biology and single molecule chemical manipulation and sensor detection is unlikely to obediently follow the same innovation curve.

gabeiscoding | 13 years ago | on: Analyzing my DNA

Nope. It was a Pilot and they are not sure about doing more exomes.

From my last chat with Brian Naughton (their lead informatics guy) about this, it sounds like they are planning on doing more sequencing in the future. But it could be whole genome and it may be geared more towards research (your selected based on your phenotype) than open to any customer.

gabeiscoding | 13 years ago | on: A farewell to bioinformatics (2012)

Ahh the efficiency argument.

The trick is, academics often have excess manpower capacity in the form of grad students and post-docs. Even though personell is usually one of the highest expenses on any given grant, they often don't look at ways to improve the efficiency of their research man-hours.

That's not a blank rule, as we have definitely had success with the value proposition of research efficiency, but in general, a lot of things business adopt to improve project time (like Theory of Constraints project management, Mindset/Skillset/Toolset matching of personel et) is of no interest to academic researchers.

gabeiscoding | 13 years ago | on: A farewell to bioinformatics (2012)

They key is 23andMe was not using bleeding-edge nightly builds but official "upgrade-recommended" releases.

GATK currently has no concept of a "stable" branch of their repo (Appistry is going to provide quarterly releases in the future, which is great).

The flag I am raising is that a "stable" release is needed before it get's integrated into a clinical pipeline. Because the Broad's reputation is so high, it is important to raise this flag as otherwise researchers and even clinical bioinformaticians assume choosing the latest release of GATK for their black-box variant caller is as safe as an IT manager choosing IBM.

gabeiscoding | 13 years ago | on: A farewell to bioinformatics (2012)

On your first point, my post detailed that 23andMe confirmed it was a GATK bug that introduced the bogus variants and the bug was fixed in the next minor release of the software. There are comments on the post from members of 23andMe and the GATK team that go into more details as well.

On your second point. 23andMe had every incentive to pay attention to their output, but it is fair to say it's their responsibility for letting this slip through. But, it's worth noting in the context of the OP rant, that 23andMe probably paid much more attention to their tools than most academics who often treat alignment and variant calling as a black box that they trust works as advertised.

So what I actually argue in the post (and should have stated more clearly in my summary here) was that GATK is incentivised, as an academic research tool, to quickly advance their set of features with the cost of bugs being introduced (and hopefully squashed) along the way.

This "dev" state of a tool is inappropriate for a clinical pipeline, and GATK's teams' answer to that is a "stable" branch of GATK that will be supported by their commercial software partner. Good stuff.

Finally, I actually have no conflict of interest here as Golden Helix does not sell commercial secondary analysis tools (like CLC Bio does). I wrote this from the perspective of someone who is a 23andMe consumer as well as being informed as I give recommendations of upstream tools with our users (which I might add, I would still recommend and use GATK for research use, with the caution to potentially forgo the latest release for a more stable one).

You know though, the conflict of interest dismissal is something I run into more than I would expect. I'm not sure if some commercial software vendor has acted in bad faith in our industry to deserve the cynicism or if this is defaultly inherited by the "academic" vs "industry" ethos.

gabeiscoding | 13 years ago | on: A farewell to bioinformatics (2012)

I live in this field, as a computer scientist learning the biology, and trying to make a living with a bootstrapped company.

I wrote a post about why GATK - one of the most popular bioinformatic tools in Next Generation Sequencing should not be put into a clinical pipeline:

http://blog.goldenhelix.com/?p=1534

In terms of your ideal software strategy, I can speak to that as well, as I am actually attempting to do almost exactly what you suggesting. My team is all masters in CS & Stats, with focus on kick-ass CG visualization and UX.

We released a free genome browser (visualization of NGS data and public annotations) that reflects this:

http://www.goldenhelix.com/GenomeBrowse/

But you're right, selling software in this field is a very weird thing. It's almost B2B, but academics are not businesses and their alternative is always to throw more Post-Doc man-power at the problem or slog it out with open source tools (which many do).

That said, we've been building our business (in Montana) over the last 10 years through the GWAS era selling statistical software and are looking optimistically into the era of sequencing having a huge impact on health care.

page 1