top | item 11439725

Segment Sources – Load Salesforce, Zendesk, Stripe into Redshift and Postgres

178 points| TheHydroImpulse | 10 years ago |segment.com | reply

37 comments

order
[+] samcheng|10 years ago|reply
ETL-as-a-Service is a great idea, particularly one that is visualization/analytics-tool-agnostic!

However, there are so many data sources, and they all require different integrations with their different APIs or export mechanisms. A service isn't really useful unless it can import the lion's share of services that a given company uses...

[+] pkrein|10 years ago|reply
You’re right. There are a lot of sources out there. It’s a ton of work for companies to build out their own pipelines and learn every new API. We want to save them from that burden so that they can focus on the analysis. We’ll be adding many more connections in the coming weeks and months, and also opening up the platform for cloud services to add themselves. Stay tuned!
[+] rsobers|10 years ago|reply
Eh, you'd be surprised how much value a company can get just by marrying a few data sources (e.g., marketing automation + google analytics + CRM).

Doing this right now manually piping data into PostgreSQL via Heroku and using Chartio to visualize and query.

[+] dwmintz|10 years ago|reply
I don't really agree. I mean, yeah, comprehensiveness is great, and it sounds like Segment is working towards it. But every integration they build is one less custom integration that your data engineers have to build.
[+] far33d|10 years ago|reply
We have been very happy with sources - it doesn't cover EVERY service we use (yet) but taking even just one or two out of in-house ETL is a huge benefit.
[+] dtjones|10 years ago|reply
Agreed. I don't use any of those services. Seems the product integrations limits the customer pool quite a bit
[+] dan_ahmadi|10 years ago|reply
I wonder if this makes BI companies freak out a little bit -- because pushing this data into redshift and adding a visualization layer on top takes care of most smaller scale BI needs...
[+] greggyb|10 years ago|reply
From a BI consultancy perspective, not in the slightest. The amount of time we spend on ETL is not because of the difficulty of piping data around from place-to-place. The difficulty is in modelling data appropriately to support ad-hoc analyses. Rather, the E and L portions don't provide much difficulty (hassle, frustration, sometimes time, sure, but they're not inherently difficult).

The T, transformation, is huge in many ways. Think of it this way: the data model is the primary UI for an analyst or any power user. It also dictates query performance.

Adding a visualization layer on top of Salesforce's schema, e.g. is not too helpful, regardless of where that data is living. You can answer trivial questions without too much difficulty, but the difficulty ramps up quickly.

The data access patterns, types of logic necessary, and end-user demands are hugely different between an OLTP and OLAP workload.

There's also potentially huge complexity in conforming dimensions across disparate source systems' data.

Master data management is another huge component that hits a lot of the ETL pipeline.

These concerns are all on top of hooking up the right ends of the hose to one another.

I don't mean to disparage the product or company and hope I don't come across as if I am. I just want to point out that they address only a small component of a large process, which in turn is only a segment of the BI lifecycle.

[+] pkrein|10 years ago|reply
Actually, we don't have any interest in being a visualization tool, and are super focused on building customer data infrastructure of the future.

This product release is in close partnership with our BI partners (Looker, Mode, Wagon, Periscope, BIME and Chartio). One of the biggest problems our mutual customers face is getting data into their warehouse so that they can use the BI tool in the first place. This launch significantly expands the possible audience for them.

Even better, all of our BI partners built out-of-the-box reporting and dashboards based on Segment's schemas for these new third party sources. So our mutual customers can get set up even faster.

[+] dwmintz|10 years ago|reply
As somebody who works on a data exploration platform (Looker <- disclosure), I'd say that far from making us freak out, it makes us super happy (which is why we're so excited to partner with Segment).

I can't tell you how many potential customers are crazily excited about the idea of centralizing the data that all their apps produce into one central warehouse and putting Looker on top of that, but are stymied by the middle step of actually getting the data OUT of the vendors' APIs and IN to their own warehouse. Being able to point them to an off-the-shelf solution for that problem is a big win for us.

[+] TheLogothete|10 years ago|reply
The BI space is absurd. BI viz is the photo apps of the b2b world. Everyone and their grandma thinks they can make one. Lots of VC money gonna get burned.
[+] vyrotek|10 years ago|reply
Is this just running the SF SoQL statement for you and storing the aggregated result in a Segment table?

Or does Segment provide a way to completely clone entire SF tables such as Opportunities & Cases and then create the aggregate queries later in segment?

[+] sperand_io|10 years ago|reply
Great question! When you enable a source, we begin running a job on an interval for you that pulls the data from the source, applies some light normalizations and transformations, and sends the data to our Object API (which is in charge of upserting the data into Segment and flushing it to your warehouse).

In Salesforce's case, we issue bulk queries to pull the complete collection on the first run, then modify the queries thereafter to request only data that's changed since the last run.

We don't do any aggregation of the data. We load it into a data warehouse (redshift or postgres) in its complete, raw form so that you can use SQL to aggregate/join to your heart's content. Here's an example: https://help.segment.com/hc/en-us/articles/208215583-Salesfo...

[+] pinaceae|10 years ago|reply
just be careful with SFDC, bunch of harsh limits, killer one being 5k bulk api requests per rolling 24h.

if you think that's a lot you haven't seen big orgs with a shit ton of integrations and custom stuff on top of them.

[+] josep2|10 years ago|reply
Segment is always killing it.
[+] grinich|10 years ago|reply
They're totally doing great! (But we should use less violent language to describe it...)