top | item 33034351

Select * from cloud

790 points| kiyanwang | 3 years ago |steampipe.io | reply

282 comments

order
[+] swalsh|3 years ago|reply
Since starting my career, the tech stack I use has changed numerous times. Technologies have come, gone, evolved. But except for a brief period in 2010 where we thought we could kill it, SQL is the one technology that has stayed around. It might be the only TECHNICAL knowledge I learned in the first year of my career that is still relavent. You're probably not making the wrong decision if you decide to build on top of SQL.
[+] strbean|3 years ago|reply
> except for a brief period in 2010 where we thought we could kill it

Unfortunately, we never actually tried to kill SQL. We tried to kill relational databases, with the death of SQL as a side effect. This proved to be a stupid idea - non-relational data stores have value of course, but so do relational databases.

SQL, as you point out, is 48 years old. It shows. The ergonomics are awful. It uses natural-language inspired syntax that has fallen completely out of favor (with good reason imho). The ordering of fields and tables precludes the use of autocomplete. It uses orthogonal syntax for identical constructions, i.e.

    INSERT ... (col1, col2, ...) VALUES (val1, val2, ...)
vs.

    UPDATE ... col1=val1, col2=val2, ...
And worst of all, it comes with such a woefully inadequate standard that half of the universally supported features are expressed in a completely different fashion for each vendor.

It is very common to argue that its longevity is due to its excellence as a language. I think it's longevity is due to its broad target audience, which includes plenty of people in less technical roles who are high on the org chart. Tech workers are happy to jump on new technology that improves ergonomics (or shiny new toys that are strictly more interesting because they are new). Executives who's primary way of interacting with the tech stack is querying a read only replica are much less open to learning a new language.

[+] tiffanyh|3 years ago|reply
Declarative vs Imperative Language.

A huge benefit of SQL is that it's a declarative language (you describe what the output should be); whereas with imperative languages like C/Python/Ruby you have to describe how to generate the output with specific instructions and procedures.

[+] munificent|3 years ago|reply
I'm sure this will ruffle some feathres, but the technical skill that has been the most persistently valuable across the past 25 years of my career is object-oriented programming and design. It's not the only paradigm I use, but being able to organize large programs in terms of OOP has been incredibly valuable and hugely productive for me across several languages and domains.
[+] rubyist5eva|3 years ago|reply
I feel my abilities as a developer really took off when I took the time to really learn SQL and take advantage of the features that my database offers. For example, learning to use window functions felt like that galaxy brain meme where you become one with the universe.

Being able to take 50 lines of ruby code (manual (anti-)joins and aggregates, result partitioning, etc...) and replace it with a couple lines of sql that is much faster and less buggy is a life changing experience.

The only other time I had such a dramatic shift in the way that I look at building applications is when I learned Scala (functional programming).

[+] irrational|3 years ago|reply
I’ve been using HTML since the mid 1990s. That is probably the oldest technology that I still use on a regular basis. SQL would be a close second (or maybe third - I probably started learning sql around 2001 and I can’t remember when JavaScript came out. Surely before that.)
[+] that_guy_iain|3 years ago|reply
Honestly, mines has mostly stayed the same. I've played with various techs but I mostly work the same underlying tech. Some items get added but very rarely does something get replaced or removed. The tech gets improved and new features added .

The last time I can remember something getting replaced is vargrant for docker. And honestly, I think that's a mistake for many companies. Before that it was git for svn.

The only way I can see tech coming and going is you're always an early adoptor and need to replace things that failed. If you build on stable tech, that stuff stays around for decades even if it was terrible in the first place. A good example is PHP. It was terrible at first still around and improving year on year.

[+] quickthrower2|3 years ago|reply
The problem I see here is building SQL on top of the cloud, which is like building SQL on top of a database where the tables are always changing - not just adding columns but the meaning of the columns and rows, or even the existence of the table.

I think this is a good idea, but the cloud doesn't seem to care too much about backwards compatibility. If you use MySQL/PostgreSQL/SQL Server, you can be fairly sure in 20 years time your SQL will still work :-).

The need for this library is more an indictment on "the cloud" than anything. I am mostly an Azure user so AWS/Google might be better but man it changes it's UI and CLI interfaces alot - way too much.

[+] papito|3 years ago|reply
SQL is a technology that engineers should know in an out, and while in early 2000's you could not move an inch without knowing it, I find that the new generation of devs really considers database calls a freebie - hidden behind abstractions and ORMs. It's just a line of code, right?

This leads to highly non-performant software since, unless we have bent the laws of physics in the last 15 years, your data access layer is the most expensive one.

But we are crushing LeetCode, yeah?

[+] rsync|3 years ago|reply
"Since starting my career, the tech stack I use has changed numerous times. Technologies have come, gone, evolved. But except for a brief period in 2010 where we thought we could kill it, SQL is the one technology that has stayed around."

Ethernet comes to mind when I think of technologies that just can't be killed ...

[+] matwood|3 years ago|reply
> It might be the only TECHNICAL knowledge I learned in the first year of my career that is still relavent.

Agree. IIRC, I wrote my first SQL is 1996 and wrote some today. While not perfect, it’s amazing at what it does. I suggest reading Joe Celko to really up your SQL skills.

[+] api|3 years ago|reply
That's not random. SQL is basically math. The syntax is a little old school and arcane (but not terrible) but the principles behind it are set theory and logic.
[+] sleiben|3 years ago|reply
Reminds me of Git.
[+] nathanwallace|3 years ago|reply
Excited to see Steampipe shared here - thanks kiyanwang! I'm a lead on the project, so sharing some quick info below and happy to answer any questions.

Steampipe is open source [1] and uses Postgres foreign data wrappers under the hood [2]. We have 84+ plugins to SQL query AWS, GitHub, Slack, HN, etc [3]. Mods (written in HCL) provide dashboards as code and automated security & compliance benchmarks [3]. We'd love your help & feedback!

1 - https://github.com/turbot/steampipe 2 - https://steampipe.io/docs/develop/overview 3 - https://hub.steampipe.io/

[+] davesque|3 years ago|reply
Hey Nathan. Can you comment on some of the known performance pitfalls of steampipe? I'm not super familiar with the Postgres foreign data wrappers API. I assume steampipe inherits many of its technical limitations from this API.

Having done some work in this space, I'm aware that it's no small thing to compile a high-level SQL statement that describes some analytics task to be executed on a dataset into a low-level program that will efficiently perform that task. This is especially true in the context of big data and distributed analytics. Also true if you're trying to blend different data sources that themselves might not have efficient query engines.

Would love to use this tool, but just curious about some of the details of its implementation.

[+] klysm|3 years ago|reply
Love to see Postgres FDW used. It’s a really powerful and imo not utilized as much as it could be.
[+] cube2222|3 years ago|reply
Hey! Steampipe looks great and I think the architecture you chose is very smart. Some feedback from myself:

I've tried setting up steampipe with metabase to do some dashboarding. However, I’ve found that it mostly exposes "configuration" parameters, so to say. I couldn't find dynamic info, like S3 bucket size or autoscaling group instance count.

Have I done something backwards or not noticed a table, or is that a design decision of some sort? That was half a year ago, so things might've changed since then, too.

[+] petercooper|3 years ago|reply
It's a very cool project!

It might just be a coincidence, but an hour before this HN post, I discovered it way back in our queue of things to review for https://golangweekly.com/ and featured it in today's issue. Hopefully kiyanwang is one of our readers :-D

[+] aaronharnly|3 years ago|reply
Congratulations! This looks incredibly powerful and I'm excited to check it out.

Although this is pitched primarily as a "live" query tool, it feels like we could get the most value out of combining this with our existing warehouse, ELT, and BI toolchain.

Do you see people trying to do this, and any advice on approaches? For example, do folks perform joins in the BI layer? (Notably, Looker doesn't do that well.) Or do people just do bulk queries to basically treat your plugins as a Fivetran/Stitch competitor?

[+] ezekg|3 years ago|reply
This is awesome. Makes me want to write a plugin for my SaaS, just because it looks fun.
[+] breck|3 years ago|reply
This is amazing. Can't believe I hadn't seen it before. Nice job
[+] VectorLock|3 years ago|reply
Are people usually setting up centralized shared instances of Steampipe or is it more of a "run on everyone's laptop" deployment preferred?
[+] ComodoHacker|3 years ago|reply
What about joins, are they supported? Can't see them in the examples.

The real power of SQL is locked until you can join different data sources.

[+] Lorin|3 years ago|reply
FYI hiring page requires login to notion
[+] jedberg|3 years ago|reply
Steampipe killed my startup. I couldn't be happier about it.

I started a company six years ago to do exactly this -- make a SQL interface for the AWS API. Steampipe out-executed us and made a way better product, in part because their technology choices were much smarter than ours. I shut down the startup as soon as I saw steampipe.

I wish them all the best, it's a great product!

[+] patrec|3 years ago|reply
I'm curious: what were the bad technological choices you made and the good ones they made?
[+] chousuke|3 years ago|reply
Steampipe is unbeliveably powerful.

Writing custom scripts for even the simplest of queries comes nowhere near the convenience of using PostgreSQL to get what you want. If you wanted to find and report all instances in an AWS account across all regions, with steampipe it's just:

   SELECT * FROM aws_ec2_instance;
Even the simplest implementation with a Python script won't be nearly as nice, and once you start combining tables and doing more complicated queries, there's just no competition.
[+] Hikikomori|3 years ago|reply
Tried steampipe out to replace my simple python script. It wasn't able to handle our 1k+ accounts and multiple regions, initialization took over 1 hour before I was even able to do a query.

Meanwhile my python script can run any API calls in all our accounts and regions and finish in a few minutes. Maybe 100 lines of python using boto3 and multiprocessing that outputs to json or yaml.

[+] phillu|3 years ago|reply
I don't agree that this simple query wouldn't be as nice with boto3. But combining different services is definitely a nice feature, if you have the use case for it.
[+] yevpats|3 years ago|reply
Good to see steampipe here :)

Im the author of CloudQuery (https://github.com/cloudquery) and I believe we started at the same time though took some different approaches.

PG FDW - is def an exciting PostgreSQL tech and is great for things like on-demand querying and filtering.

We took a more standard ELT approach where we have a protocol between source plugins and destination plugins so we can support multiple databases, data-lakes, storage layers, kinda similar to Fivetran and airbyte but with focus on infrastructure tools.

Also, as a more standard ELT approach our policies use standard SQL directly and dashboards re-use existing BI tools rather then implementing dashboarding in-house.

I can see the power of all-in-one and FDW and def will be following this project!

[+] armchairhacker|3 years ago|reply
We have REST endpoints generated from SQL databases, and now we have SQL queries which connect to REST endpoints.
[+] ezekg|3 years ago|reply
How does this handle pagination and also how does it behave towards rate limiters? For example, my API (keygen.sh) has a page size limit of 100, and often times there can be hundreds or even thousands of pages for certain resources. How would Steampipe handle `select count(*) from keygen_licenses` while obeying the rate limiter? What if a heavy query is run twice?
[+] judell|3 years ago|reply
Pagination may be handled by the API's golang SDK, if not that's a responsibility of the plugin author.

The plugin SDK provides a default retry/backoff mechanism, and the plugin author can enhance that.

If a heavy query runs twice it'll load from cache the second time, if within the (user-configurable) cache TTL.

[+] michaelsalim|3 years ago|reply
This makes me want to create a Grafana dashboard for everything. Instead of relying on custom plugins (or pulling & storing the data yourself), now it's just another postgres data source. I hope this becomes mainstream!
[+] haolez|3 years ago|reply
Looks fun. I assume it's just a visibility tool and there is no intention of offering INSERT, UPDATE and DELETE, right?
[+] nathanwallace|3 years ago|reply
Yes, we've focused on SELECT for read only - which provides a lot of value as a "universal API" to live query these many sources. Steampipe is built on Postgres Foreign Data Wrappers [1] under the hood, so could be extended to full CRUD in time. Our current belief is that it's better to follow CQRS [2] in this case though - using SQL for the queries and more native tools for the commands.

1 - https://www.postgresql.org/docs/current/ddl-foreign-data.htm... 2 - https://martinfowler.com/bliki/CQRS.html

[+] ashayh|3 years ago|reply
Highly recommend Steampipe. No other open source tool was able to deal with the large number of Route53 records I had to deal with.
[+] xwowsersx|3 years ago|reply
I'm in the process of determining which RDS instance type to switch to and I've had to have a browser open with a million tabs copying and pasting information and pricing data from AWS. Could I use steam pipe to help me with this?
[+] debarshri|3 years ago|reply
There is another contender in infrasql realm called cloudquery[1]. Steampipe has been around for very long. Mind you the steampipe has an AGPL license.

[1]https://www.cloudquery.io

[+] Canada|3 years ago|reply
I really like how it just worked when I installed it and the AWS plugin. The second one I tried was tailscale... I get some different results to the same query which is weird and broken, and an error "device can only be managed by original tailnet (403) (SQLSTATE HV000)"

Still, getting access to all this stuff with a SQL interface is very compelling so I'll try to get the plugins for all the stuff I have working properly.

[+] diceduckmonk|3 years ago|reply
We built a UI-centric cross-cloud search engine, too. <edit: link redacted as other commenter perceived this to be an ad>

The first VC we pitched to felt like this was too niche of a problem. They wanted us to come back with a different, grander, pitch, so we'll see. In the current fundraising climate, it's been difficult to gather data points on whether we are on the right track (this post makes us feel like we're not crazy). We reached out to investors outside America, where we're based, and we're quickly realizing the VCs aren't as tech-savvy as we expected. After going thru the YC application process this cycle, we've have much greater appreciation for YC. They understand tech and startups, both. For one, we're instructed to define a toy / small problem, "don't talk, do" as opposed to pretty slides and ideas.

Best of luck to Steampipe and whoever else is working on this problem.

[+] chrsstrm|3 years ago|reply
So this looks interesting and I wanted to test it on a use-case I have currently. I want to query both a csv and a zipped csv living online but it doesn't appear that is possible. The csv plugin implies you can only work with local files, nothing over http(s). This kind of seems like a big oversight for a product like this, no?
[+] iteratorx|3 years ago|reply
I have recently tried to set up Steampipe to query all our cloud costs, but I noticed that costs are only available for AWS. Azure, GCP, DigitalOcean costs are not available in their respective plugins.

Has anyone found a way to get the costs in those plugins? It is a limitation of the plugins or of the underlying cloud APIs?

[+] nathanwallace|3 years ago|reply
Please open GitHub issues (or PRs) for any tables you'd like to added! If there is a cost API then we'd definitely like to have a table to query it. In the meantime, you may find the "Thrifty" mods useful - they have many prebuilt queries to look for cost saving opportunities in AWS, Azure, GCP and DigitalOcean - https://hub.steampipe.io/mods?q=thrifty