Launch HN: Secoda (YC S21) – Searchable Company Data
97 points| Etai | 4 years ago
As a company grows, so does its data. Tables, metrics, queries, and dashboards often become isolated and are difficult to find. Even with great practices, organizations still struggle to get value out of their data - up to 73% of all enterprise data goes unused. One of the big contributors to this problem is that organizations create data silos by not documenting and centralizing their data knowledge in a single place where every employee can access information about data.
Andrew and I experienced this problem first hand at the last company we worked at. Andrew led the Product team and I led the Operations team and found that it was extremely difficult to find, understand and use data without looping in someone on the data team to help. The problem was that we only had 1 employee on the data team who supported over 100 employees asking questions about how to find and use company data, which meant that it would take around 2 weeks to get an answer to any data request.
Other data management tools focus on listing all data resources, regardless of their relevance or accuracy - you generally just get a list of what's available, but not in a form that's very meaningful. We adopted some of these tools in our last jobs but found that they created an overwhelming index of too many tables, dashboards and queries that weren’t relevant to most employees. This meant that even after adopting a tool to solve the problem, most employees still couldn’t use them to find, understand and use data.
Our approach to solving this problem is to build Secoda as a tool that helps data teams curate metadata for less technical employees. Instead of listing every resource, data teams can use our tool to curate and document data for specific departments or roles. As a result, employees who are less familiar with data will not be overloaded by information that is irrelevant or too technical. Our goal is basically to be like Google search for in-company data. You enter what you need and you get back the relevant information. We integrate into databases, data warehouses, BI, and transformation tools and offer both an on-prem and cloud-hosted deployment.
Over the last six months, our team has been improving our product closely with our early adopters to build a better product. Today, we’re excited to share the launch of our self-service product with the HN community. You can now sign up to Secoda, connect your database or data warehouse and start using Secoda without a sales call. We offer a free 14-day trial (no credit card required). After the free trial, we charge per editor, per month. If you’d like, you can also take a look at this video of us setting up our Secoda workspace: https://www.loom.com/share/f41b317441554a36930b9cfe4c91a45f.
We're also hiring for a number of roles, which you can find here: https://www.workatastartup.com/companies/secoda.
We’d love to hear about your experiences with data discovery and any ideas/feedback/questions you might have about what we’re building!
[+] [-] gregdoesit|4 years ago|reply
A few questions:
1. How do you go about permission? This was a major question at Uber (where permission were put in place early enough). Especially with GDPR and other regulations, you cannot have anyone access any data.
2. What about PII? Some data needs to be stored, but cannot be viewed except for very, very few people and with a strong audit tail. This is a more specialized case for #1.
3. How do you see the tool "spread" the most within companies? I would assume that easy sharing is how people learn about this, then try it themselves... but would love to hear what you actually see.
[+] [-] Etai|4 years ago|reply
We have pretty advanced RBAC in Secoda. You can make anyone a viewer, guest, admin or editor in the workspace. Viewers and Editors are only able to see the information. Secondly, we allow you to create "groups" for different functions in the organizations (ie. marketing, sales etc.). You can choose to share any resource with a specific user or group. This works similar to the RBAC that Notion uses, which only means that the right people are seeing the right information in Secoda. Lastly, we allow data teams to create "collections" of information, which can be shared with specific groups or specific users. Without sounding bias, I think this is where Secoda excels as a product.
2. What about PII? Some data needs to be stored, but cannot be viewed except for very, very few people and with a strong audit tail. This is a more specialized case for #1.
We have an ability to auto tag PII on a table and column level. Any PII data won't be viewable without permission from the admin.
3. How do you see the tool "spread" the most within companies? I would assume that easy sharing is how people learn about this, then try it themselves... but would love to hear what you actually see.
Usually the Slack integration is the best way to spread Secoda. With our Slack integration, any employee can search for information by pressing /secoda in Slack. You can also push information from Slack to Secoda and vice versa. This exposes Secoda to new employees in the place they work.
[+] [-] slotrans|4 years ago|reply
The reason the distinction matters is that if it's curation-based, the onus is still on the data team to document all relevant assets, which they could already do, and have already demonstrated they don't want to.
Now, it could still be a good metadata catalog! Most of what's out there is bad. But if that's what you're shooting for, pitching it as "search" will be confusing.
[+] [-] andrewmcewen|4 years ago|reply
[+] [-] ramish94|4 years ago|reply
What is your strategy to scale out & maintain integrations? Speaking from experience, it's not something that is easy to scale out unless you have a dedicated team whose job is to build them out, or you have some third-party provider like CData providing OOTB connectors for your product.
(On a side note, this looks fantastic. Are you hiring any product folks per chance? I have significant experience tackling this same problem).
[+] [-] Etai|4 years ago|reply
We're also considering open sourcing that part of the product, but haven't made a firm decision on that yet. Would love to chat if you're open to it. We're definitely looking for people in product. Feel free to send me an email to [email protected]
[+] [-] bnj|4 years ago|reply
We’ve sketched out an initial solution which looks a lot like Secoda, except focused on csv files, the concept being to check csv data sets into the library, add metadata, and then define how to bridge it into the central data store.
I’ll dig further into the website, it looks like you’ve done a lot of good work avoiding repeatedly addressing the same questions!
[+] [-] Etai|4 years ago|reply
[+] [-] ianbutler|4 years ago|reply
So how do you compare to a Data Catalog like datahub? https://datahubproject.io/
From the video you looked very similar to them as a metadata consumer and they provide extensive API integrations so you can add basically any set of metadata you want including slack, jira etc. They're also offering a hosted version.
Their metadata is indexed into a tuneable ES cluster so you can fiddle with relevance etc to your hearts content.
What's your big differentiator?
[+] [-] andrewmcewen|4 years ago|reply
1. If you're using the DataHub open source solution it requires a data engineer to get the platform up and running and maintained, which can be a fairly expensive cost depending on the salary of the data engineer. Secoda has 15+ no code integrations that can be setup in 5 minutes and is a fully managed solution. We are releasing a metadata API that will be available before the end of the year, in case an organization is using a product that we do not currently integrate with.
2. Acryl (managed version of DataHub) is mainly focused on the data catalog, which they do a great job for. However, they don't provide the questions, dictionary, and visualization components that we provide in addition to the catalog. These additional components of the product add more context around data knowledge, and are also focused on helping non-technical users understand company data. Whereas the data catalog is focused more on helping technical data users understand company data.
3. Also if you're using Acryl, you'll have to get in touch with their team to get a demo of the product. For Secoda, you can signup at https://app.secoda.co and try out a free trial of the product without having to talk with our team. We do offer demos if people are interested though.
[+] [-] applgo443|4 years ago|reply
How is it different from Glean? https://www.glean.com
[+] [-] ccleve|4 years ago|reply
What's your tech stack?
Did you create the integrations from scratch, or use something like Zapier?
[+] [-] Etai|4 years ago|reply
[+] [-] rememberlenny|4 years ago|reply
[+] [-] dmolot|4 years ago|reply
[+] [-] Etai|4 years ago|reply
[+] [-] coderintherye|4 years ago|reply
[+] [-] dang|4 years ago|reply
[+] [-] nchudleigh|4 years ago|reply
Excited to see it roll-out in the org and build a solid data knowledge base.
[+] [-] djbusby|4 years ago|reply
[+] [-] Etai|4 years ago|reply
[+] [-] csnerd|4 years ago|reply
[+] [-] mona_rakibe|4 years ago|reply
[+] [-] Etai|4 years ago|reply
[+] [-] mritchie712|4 years ago|reply
[+] [-] andrewmcewen|4 years ago|reply
1. It is likely that you'll need to setup a 1:1 meeting with a ThoughtSpot expert to help get your company up and running on the software. Our goal with Secoda is to be a self-service platform that is designed for any company, large or small, to get their data knowledge base setup in 5 minutes.
2. ThoughtSpot's price point is typically in the six figure range, which is much higher than Secoda's price point that starts at $29/editor on the platform.
3. Thoughtspot's core focus is providing answers to data questions through visualizations. Secoda takes a more comprehensive approach to documenting data knowledge. In addition to having visualizations that help answer questions, we also provide a shared data dictionary for defining metrics, as well as a catalog that can store tables, dashboards, jobs, and many other data resources.