ETL-as-a-Service is a great idea, particularly one that is visualization/analytics-tool-agnostic!
However, there are so many data sources, and they all require different integrations with their different APIs or export mechanisms. A service isn't really useful unless it can import the lion's share of services that a given company uses...
You’re right. There are a lot of sources out there. It’s a ton of work for companies to build out their own pipelines and learn every new API. We want to save them from that burden so that they can focus on the analysis. We’ll be adding many more connections in the coming weeks and months, and also opening up the platform for cloud services to add themselves. Stay tuned!
I don't really agree. I mean, yeah, comprehensiveness is great, and it sounds like Segment is working towards it. But every integration they build is one less custom integration that your data engineers have to build.
My company (Fivetran, YC W2013) offers the same service and supports a lot more sources (https://fivetran.com/integrate), including relational databases.
We have been very happy with sources - it doesn't cover EVERY service we use (yet) but taking even just one or two out of in-house ETL is a huge benefit.
I wonder if this makes BI companies freak out a little bit -- because pushing this data into redshift and adding a visualization layer on top takes care of most smaller scale BI needs...
From a BI consultancy perspective, not in the slightest. The amount of time we spend on ETL is not because of the difficulty of piping data around from place-to-place. The difficulty is in modelling data appropriately to support ad-hoc analyses. Rather, the E and L portions don't provide much difficulty (hassle, frustration, sometimes time, sure, but they're not inherently difficult).
The T, transformation, is huge in many ways. Think of it this way: the data model is the primary UI for an analyst or any power user. It also dictates query performance.
Adding a visualization layer on top of Salesforce's schema, e.g. is not too helpful, regardless of where that data is living. You can answer trivial questions without too much difficulty, but the difficulty ramps up quickly.
The data access patterns, types of logic necessary, and end-user demands are hugely different between an OLTP and OLAP workload.
There's also potentially huge complexity in conforming dimensions across disparate source systems' data.
Master data management is another huge component that hits a lot of the ETL pipeline.
These concerns are all on top of hooking up the right ends of the hose to one another.
I don't mean to disparage the product or company and hope I don't come across as if I am. I just want to point out that they address only a small component of a large process, which in turn is only a segment of the BI lifecycle.
Actually, we don't have any interest in being a visualization tool, and are super focused on building customer data infrastructure of the future.
This product release is in close partnership with our BI partners (Looker, Mode, Wagon, Periscope, BIME and Chartio). One of the biggest problems our mutual customers face is getting data into their warehouse so that they can use the BI tool in the first place. This launch significantly expands the possible audience for them.
Even better, all of our BI partners built out-of-the-box reporting and dashboards based on Segment's schemas for these new third party sources. So our mutual customers can get set up even faster.
As somebody who works on a data exploration platform (Looker <- disclosure), I'd say that far from making us freak out, it makes us super happy (which is why we're so excited to partner with Segment).
I can't tell you how many potential customers are crazily excited about the idea of centralizing the data that all their apps produce into one central warehouse and putting Looker on top of that, but are stymied by the middle step of actually getting the data OUT of the vendors' APIs and IN to their own warehouse. Being able to point them to an off-the-shelf solution for that problem is a big win for us.
The BI space is absurd. BI viz is the photo apps of the b2b world. Everyone and their grandma thinks they can make one. Lots of VC money gonna get burned.
Is this just running the SF SoQL statement for you and storing the aggregated result in a Segment table?
Or does Segment provide a way to completely clone entire SF tables such as Opportunities & Cases and then create the aggregate queries later in segment?
Great question! When you enable a source, we begin running a job on an interval for you that pulls the data from the source, applies some light normalizations and transformations, and sends the data to our Object API (which is in charge of upserting the data into Segment and flushing it to your warehouse).
In Salesforce's case, we issue bulk queries to pull the complete collection on the first run, then modify the queries thereafter to request only data that's changed since the last run.
We don't do any aggregation of the data. We load it into a data warehouse (redshift or postgres) in its complete, raw form so that you can use SQL to aggregate/join to your heart's content. Here's an example: https://help.segment.com/hc/en-us/articles/208215583-Salesfo...
[+] [-] samcheng|10 years ago|reply
However, there are so many data sources, and they all require different integrations with their different APIs or export mechanisms. A service isn't really useful unless it can import the lion's share of services that a given company uses...
[+] [-] pkrein|10 years ago|reply
[+] [-] rsobers|10 years ago|reply
Doing this right now manually piping data into PostgreSQL via Heroku and using Chartio to visualize and query.
[+] [-] dwmintz|10 years ago|reply
[+] [-] georgewfraser|10 years ago|reply
[+] [-] far33d|10 years ago|reply
[+] [-] dtjones|10 years ago|reply
[+] [-] unknown|10 years ago|reply
[deleted]
[+] [-] dan_ahmadi|10 years ago|reply
[+] [-] greggyb|10 years ago|reply
The T, transformation, is huge in many ways. Think of it this way: the data model is the primary UI for an analyst or any power user. It also dictates query performance.
Adding a visualization layer on top of Salesforce's schema, e.g. is not too helpful, regardless of where that data is living. You can answer trivial questions without too much difficulty, but the difficulty ramps up quickly.
The data access patterns, types of logic necessary, and end-user demands are hugely different between an OLTP and OLAP workload.
There's also potentially huge complexity in conforming dimensions across disparate source systems' data.
Master data management is another huge component that hits a lot of the ETL pipeline.
These concerns are all on top of hooking up the right ends of the hose to one another.
I don't mean to disparage the product or company and hope I don't come across as if I am. I just want to point out that they address only a small component of a large process, which in turn is only a segment of the BI lifecycle.
[+] [-] pkrein|10 years ago|reply
This product release is in close partnership with our BI partners (Looker, Mode, Wagon, Periscope, BIME and Chartio). One of the biggest problems our mutual customers face is getting data into their warehouse so that they can use the BI tool in the first place. This launch significantly expands the possible audience for them.
Even better, all of our BI partners built out-of-the-box reporting and dashboards based on Segment's schemas for these new third party sources. So our mutual customers can get set up even faster.
[+] [-] dwmintz|10 years ago|reply
I can't tell you how many potential customers are crazily excited about the idea of centralizing the data that all their apps produce into one central warehouse and putting Looker on top of that, but are stymied by the middle step of actually getting the data OUT of the vendors' APIs and IN to their own warehouse. Being able to point them to an off-the-shelf solution for that problem is a big win for us.
[+] [-] TheLogothete|10 years ago|reply
[+] [-] primeobsession|10 years ago|reply
[+] [-] uberneo|10 years ago|reply
[+] [-] vyrotek|10 years ago|reply
Or does Segment provide a way to completely clone entire SF tables such as Opportunities & Cases and then create the aggregate queries later in segment?
[+] [-] sperand_io|10 years ago|reply
In Salesforce's case, we issue bulk queries to pull the complete collection on the first run, then modify the queries thereafter to request only data that's changed since the last run.
We don't do any aggregation of the data. We load it into a data warehouse (redshift or postgres) in its complete, raw form so that you can use SQL to aggregate/join to your heart's content. Here's an example: https://help.segment.com/hc/en-us/articles/208215583-Salesfo...
[+] [-] pinaceae|10 years ago|reply
if you think that's a lot you haven't seen big orgs with a shit ton of integrations and custom stuff on top of them.
[+] [-] josep2|10 years ago|reply
[+] [-] grinich|10 years ago|reply
[+] [-] slachterman|10 years ago|reply
[+] [-] n2parko|10 years ago|reply
https://segment.com/contact/requests/warehouse
[+] [-] vyrotek|10 years ago|reply
[+] [-] n2parko|10 years ago|reply