siganakis's comments

siganakis | 4 years ago | on: When they warn of rare disorders, these prenatal tests are usually wrong

My wife and I went through this a couple of years ago, with a 10 week NIPT calling a rare trisomy (chr 9), which is always fatal within a few weeks of birth.

It was absolute hell. The key problem here is the waiting and uncertainty. You have the NIPT at 10w, but you can’t have the amniocentesis until several weeks later. When that came back fine, there were questions about whether it was a “mosaic” meaning only a small proportion of cells are effected. We were only really in the clear after the 20 week ultrasound.

That’s a lot of weeks to be consumed by wondering about whether to terminate the pregnancy, or wait it out for more information. I have a masters in bioinformatics (in genomics!) and my knowledge of stats and the science was next to useless in the face of these decisions.

I know of couples who simply couldn’t deal with this uncertainty and chose to terminate on the basis of this test alone.

Fortunately for us our child was fine and is a perfectly healthy 18 month old now, but I wouldn’t do the rare trisomy test again.

siganakis | 4 years ago | on: What the Heck is a Data Mesh?

Yes, you are understanding it correctly. The idea is that you give the "requesters" access to the data, then enable them to do their thing with it (with training / support / shadowing) and publish their results as "data-products" so that others can leverage it too in their own "data products".

The "data mesh" is essentially the collection of these independent "data-products".

We already see management problems with self-service analytics like PowerBI, Tableau & Looker. Its too easy for people to create dashboards / reports that are subtly wrong and which cause confusion. There is a balance between empowering to build data products and centralised control. Too much empowerment of people who don't understand the right way to do something leads to a horrible mess of contradictory data. Not enough, and people can't effectively do their job. Governance and process is the key to finding the balance and enforcing it.

The issue with the data-mesh is that there isn't really any great tooling to support the management or development of data products, or a data-mesh generally. I am sure this will change over time as vendors start building hype around it.

siganakis | 4 years ago | on: What the Heck is a Data Mesh?

From my experience, the core driver behind the data mesh architecture is organisational, not technological. Organisations are requiring more of data, be it for rapid product development, or self-service analytics. Often this involves large numbers of sources (e.g. external sources), rather than just larger volumes of the same thing.

If marketing, finance and sales is dependent on a centralised data team for every new thing, the data team quickly becomes the bottleneck, stifling innovation and frustrating teams. Incorporating the principles of a Data Mesh enables those teams to manage their own data, according to well defined governance standards that enable interoperability.

The reality is that different teams are already managing their own data (via excel spreadsheets, web-apps, etc). If we can apply a bit more rigor to how these datasets are managed (e.g. so they can be shared, integrated, secured, etc), then the whole organisation benefits.

siganakis | 6 years ago | on: Cloud AI Platform Pipelines

Cloud AI Platform Pipelines appear to use Kubeflow Pipelines on the backend, which is open source [1] and runs on Kubernetes. The Kubeflow team has invested a lot of time on making it simple to deploy across a variety of public clouds [2], [3].

If Google were to kill it, you could easily run it on any other hosted Kubernetes service.

I haven't used Cloud AI Platform Pipelines, but have spent a lot of time working with Kubeflow Pipelines and its pretty great!

[1] https://github.com/kubeflow/pipelines

[2] https://www.kubeflow.org/docs/aws/ (Deploy to AWS)

[3] https://www.kubeflow.org/docs/azure/ (Deploy to Azure)

siganakis | 10 years ago | on: Kerf: a columnar tick database for Linux, OS X, BSD, iOS, Android

I am really interested in this as it seems like a more accessible version of KDB - especially in its support for SQL.

I'd like to look into it further, but I can't find any information about licensing. Given that there is no source code in the repo, it appears that this isn't an open source project.

siganakis | 11 years ago | on: SpringRole – Everyone is a Recruiter

Am I right in guessing that a "Passive user" is a user whom you collect information on without them opting in? E.g. By scraping public profiles or collecting connections from registered users?

Is it possible to view / check / correct any information about myself if I am a "passive user" on your platform?

Many eu countries and Austalia have privacy laws around these basic rights.

siganakis | 11 years ago | on: Ask HN: Who is using the .NET stack for their startup?

Yeah, we use .Net 4.5.3, without EF. Connections still fail all the time especially under high demand.

We have our own retry logic, which also logs the issue so we are aware of how frequently errors occur while a command / transaction is being executed.

This is using SQL Azure with the "Business" tier, so it will be interesting to see how the new (much more highly priced tiers) Standard and Premium tiers go.

siganakis | 11 years ago | on: Ask HN: Who is using the .NET stack for their startup?

We (msgooroo.com) use the .Net stack on Azure and have found it to be quite good. We have even been experimenting with the new vNext / OWIN stack which appears even better and will give us the flexibility to run on Linux.

Azure is a bit hit and miss. Its brilliant for getting something up an running quickly (using websites / SQL Server), but is a little flaky at scale.

Key problems include connection issues with SQL Server, connection issues with their hosted Redis service, pricing of SQL Server when using advanced features like geo-replication.

All in all though, its a pretty good development experience once you get your head around the fact that in the cloud services fail and there is nothing you can do about it except plan for it.

Oh and the Bizspark program they have gives you $100 worth of free hosting on Azure which is always nice.

siganakis | 11 years ago | on: Ask HN: What DB to use for huge time series?

Remember that KDB is based on K, which stems from APL, which relies on symbols rather than words for its functions.

Coming from that background, C and especially C# must seem extremely verbose.

For example (from Wikipedia):

In K, finding the prime numbers from 1 to R is done with [0]:

    (!R)@&{&/x!/:2_!x}'!R
And APL[1]:

    (~R∊R∘.×R)/R←1↓ιR

Its truly awesome stuff.

[0]: http://en.wikipedia.org/wiki/K_(programming_language) [1]: http://en.wikipedia.org/wiki/APL_(programming_language)

siganakis | 11 years ago | on: The laws of shitty dashboards

I've found that replacing a fancy dashboard of key stats with a simple daily email with 5-10 key numbers is far more valuable.

The dirty little secret of the business intelligence / dashboard industry is that no one logs into them.

A daily email helps with this problem, as people tend to read emails, even if its only a glance.

siganakis | 11 years ago | on: Building New SQL (2013) [pdf]

Yeah, I think composability is one of the biggest things missing from SQL.

The issue is that composability is often tied to actually moving data around in the database which has terrible performance. That is, you can compose a query of multiple queries that dump partial data sets into temp tables.

Views get you part of the way there, but they are designed to be long lived and are visible to all database users until they are dropped. This means its dangerous to change them or clean them up, as its not always clear who they are being used by.

Ephermal or temporary views that are session/connection based, or even loadable as modules would be useful to me.

siganakis | 11 years ago | on: Building New SQL (2013) [pdf]

I think that the biggest problem with SQL is more around the actual syntax of the language and how verbose it feels to write complicated queries.

I would prefer a syntax layer that can be compiled / transformed back to SQL but that does basic things like having a query start with the tables, then joins, then groupings then the final projection.

Also a less cumbersome way to use the "WITH" statement to form named sub-queries.

Perhaps something like:

    SELECT 
        COUNT(*) as columns,
        column_type,
        table_name
    FROM (
        SELECT  c.id, 
                c.type AS column_type,
                t.name AS table_name
        FROM tables t
        INNER JOIN columns c
        ON t.id = c.table_id
        WHERE t.system=false;
    ) a
    HAVING COUNT(*) > 1
    ORDER BY columns DESC
Being re-written as:

    # Use ":="  to replace WITH for named ephermal views
    # Replace "WHERE" with "?", "SELECT" with "|>" at the end
    non_system := tables 
        ? system=false 
        |> name:table_name, is:table_id

    # Replace INNER JOIN with "*="
    non_system_columns := non_system.table_id *= columns.table_id
        |> c.id, c.type:column_type

    # GROUP BY columns are automatically generated by non-aggregated columns
    column_types := non_system_columns
        |> COUNT(*):columns DESC, column_type, table_name

So the final query may look like:

    non_system := tables ? system=false 
        |> name:table_name, is:table_id
    non_system_columns := non_system.table_id *= columns.table_id
        |> c.id, c.type:column_type
    
    non_system_columns 
        |> COUNT(*):columns DESC, column_type, table_name

Any thoughts on this?

siganakis | 11 years ago | on: Show HN: Domino, a PaaS for data science

From my reading of the getting started guide, it looks like it treats your "working" directory as a git repository, with each run basically doing a commit & push.

This means that your data (in files) needs to be in the working directory, and is versioned along side your code. Sounds pretty cool, but I am not sure how it would scale for large / constantly changing data sets.

siganakis | 11 years ago | on: Are Pixels Productivity? A study 24 years in the making

I have a pair of AMD Radeon 5700 Series cards (apparently released in 2009!), which seem more than up to the challenge. One thing to consider is that with 2 graphics cards, you needs at least a 550W power supply (I had a 500W power supply that died trying its best).

Unfortunately my carpentry skills lacking, and the Dell Single Monitor Arm [1] are NOT suitable for mounting 2 monitors on top of each other (not enough height).

[1]: http://accessories.us.dell.com/sna/productdetail.aspx?c=us&l...

siganakis | 11 years ago | on: Are Pixels Productivity? A study 24 years in the making

You are quite right, it is a ridiculous metric that doesn't stand up to any scientific rigour.

The article was just supposed to be a bit of fun about my personal experience with different monitor configurations. I thought that this might be interesting to the HN crowd since so many of us spend so much time in front of screens.

page 1