Segment Sources – Load Salesforce, Zendesk, Stripe into Redshift and Postgres

[+] samcheng|10 years ago|reply

ETL-as-a-Service is a great idea, particularly one that is visualization/analytics-tool-agnostic!

However, there are so many data sources, and they all require different integrations with their different APIs or export mechanisms. A service isn't really useful unless it can import the lion's share of services that a given company uses...

[+] pkrein|10 years ago|reply

You’re right. There are a lot of sources out there. It’s a ton of work for companies to build out their own pipelines and learn every new API. We want to save them from that burden so that they can focus on the analysis. We’ll be adding many more connections in the coming weeks and months, and also opening up the platform for cloud services to add themselves. Stay tuned!

[+] rsobers|10 years ago|reply

Eh, you'd be surprised how much value a company can get just by marrying a few data sources (e.g., marketing automation + google analytics + CRM).

Doing this right now manually piping data into PostgreSQL via Heroku and using Chartio to visualize and query.

[+] dwmintz|10 years ago|reply

I don't really agree. I mean, yeah, comprehensiveness is great, and it sounds like Segment is working towards it. But every integration they build is one less custom integration that your data engineers have to build.

[+] georgewfraser|10 years ago|reply

My company (Fivetran, YC W2013) offers the same service and supports a lot more sources (https://fivetran.com/integrate), including relational databases.

[+] far33d|10 years ago|reply

We have been very happy with sources - it doesn't cover EVERY service we use (yet) but taking even just one or two out of in-house ETL is a huge benefit.

[+] dtjones|10 years ago|reply

Agreed. I don't use any of those services. Seems the product integrations limits the customer pool quite a bit

[+] unknown|10 years ago|reply

[deleted]

[+] dan_ahmadi|10 years ago|reply

I wonder if this makes BI companies freak out a little bit -- because pushing this data into redshift and adding a visualization layer on top takes care of most smaller scale BI needs...

[+] greggyb|10 years ago|reply

From a BI consultancy perspective, not in the slightest. The amount of time we spend on ETL is not because of the difficulty of piping data around from place-to-place. The difficulty is in modelling data appropriately to support ad-hoc analyses. Rather, the E and L portions don't provide much difficulty (hassle, frustration, sometimes time, sure, but they're not inherently difficult).

The T, transformation, is huge in many ways. Think of it this way: the data model is the primary UI for an analyst or any power user. It also dictates query performance.

Adding a visualization layer on top of Salesforce's schema, e.g. is not too helpful, regardless of where that data is living. You can answer trivial questions without too much difficulty, but the difficulty ramps up quickly.

The data access patterns, types of logic necessary, and end-user demands are hugely different between an OLTP and OLAP workload.

There's also potentially huge complexity in conforming dimensions across disparate source systems' data.

Master data management is another huge component that hits a lot of the ETL pipeline.

These concerns are all on top of hooking up the right ends of the hose to one another.

I don't mean to disparage the product or company and hope I don't come across as if I am. I just want to point out that they address only a small component of a large process, which in turn is only a segment of the BI lifecycle.

[+] pkrein|10 years ago|reply

Actually, we don't have any interest in being a visualization tool, and are super focused on building customer data infrastructure of the future.

This product release is in close partnership with our BI partners (Looker, Mode, Wagon, Periscope, BIME and Chartio). One of the biggest problems our mutual customers face is getting data into their warehouse so that they can use the BI tool in the first place. This launch significantly expands the possible audience for them.

Even better, all of our BI partners built out-of-the-box reporting and dashboards based on Segment's schemas for these new third party sources. So our mutual customers can get set up even faster.

[+] dwmintz|10 years ago|reply

As somebody who works on a data exploration platform (Looker <- disclosure), I'd say that far from making us freak out, it makes us super happy (which is why we're so excited to partner with Segment).

I can't tell you how many potential customers are crazily excited about the idea of centralizing the data that all their apps produce into one central warehouse and putting Looker on top of that, but are stymied by the middle step of actually getting the data OUT of the vendors' APIs and IN to their own warehouse. Being able to point them to an off-the-shelf solution for that problem is a big win for us.

[+] TheLogothete|10 years ago|reply

The BI space is absurd. BI viz is the photo apps of the b2b world. Everyone and their grandma thinks they can make one. Lots of VC money gonna get burned.

[+] primeobsession|10 years ago|reply

RJMetrics has a similar product (ETL as a service) with 10x the number of rows for their free tier. https://rjmetrics.com/product/pipeline/

[+] uberneo|10 years ago|reply

What a coincidence, today only I came across another ETL as a service from Pintrest - https://news.ycombinator.com/item?id=11438216

[+] vyrotek|10 years ago|reply

Is this just running the SF SoQL statement for you and storing the aggregated result in a Segment table?

Or does Segment provide a way to completely clone entire SF tables such as Opportunities & Cases and then create the aggregate queries later in segment?

[+] sperand_io|10 years ago|reply

Great question! When you enable a source, we begin running a job on an interval for you that pulls the data from the source, applies some light normalizations and transformations, and sends the data to our Object API (which is in charge of upserting the data into Segment and flushing it to your warehouse).

In Salesforce's case, we issue bulk queries to pull the complete collection on the first run, then modify the queries thereafter to request only data that's changed since the last run.

We don't do any aggregation of the data. We load it into a data warehouse (redshift or postgres) in its complete, raw form so that you can use SQL to aggregate/join to your heart's content. Here's an example: https://help.segment.com/hc/en-us/articles/208215583-Salesfo...

[+] pinaceae|10 years ago|reply

just be careful with SFDC, bunch of harsh limits, killer one being 5k bulk api requests per rolling 24h.

if you think that's a lot you haven't seen big orgs with a shit ton of integrations and custom stuff on top of them.

[+] josep2|10 years ago|reply

Segment is always killing it.

[+] grinich|10 years ago|reply

They're totally doing great! (But we should use less violent language to describe it...)

[+] slachterman|10 years ago|reply

Are there any plans for Warehouses support for IBM's DashDB?

[+] n2parko|10 years ago|reply

Hey slachterman DashDB isn't yet on the roadmap, mind filling this out so we can follow up?

https://segment.com/contact/requests/warehouse

[+] vyrotek|10 years ago|reply

Please add options to use Windows Azure as a warehouse!

[+] n2parko|10 years ago|reply

hey vyrotek thanks for the request! mind filling this out a request and we'll follow up? https://segment.com/contact/requests/warehouse

37 comments