It's fun to wake up to find a project that I started on the top of HN! These days, I'm no longer very involved with the day-to-day of the project.
Now that it's no longer a young project, here are some musings about Huginn and responses to people's comments in this thread, in no particular order.
I've found that Huginn excels as a scheduled web scraper with lightweight filtering. That's what I use it for. On the other hand, while you can write custom code in it, Huginn is pretty poor at implementing any sort of complex logic, and is even worse at bidirectional syncing between systems, which is something people often want it to do, but for which it wasn't designed.
If IFTTT or Zapier meet your needs, awesome! No need to run and monitor your own service. I personally choose to run Huginn on my own hardware in large part so that I'm comfortable giving it website cookies and passwords.
Some examples of what I use Huginn for these days:
- Watching Twitter in realtime for high standard deviation spikes in certain keywords, such as "san francisco emergency" or "san francisco tsunami warning", which then sends me a push notification, or "huginn open source", that goes to a digest email (and I imagine will trigger because of this thread).
- Watching Twitter for rare terms and sending me a digest of all tweets that match them. Also sending me all tweets from a few Twitter users who post rarely, but that I don't want to miss.
- Scraping a number of rarely updated blogs that don't have email newsletters and emailing me when they change. Some use RSS, most are just simple HTML scraping.
- Pulling deals from the frontpage and forums of slickdeals and craigslist and filtering them for certain keywords.
- Sending an early morning email if it's going to rain today.
- Watching ebay for some rare items.
- Sending my wife and me an email on Saturday morning with local yardsales from craigslist.
- Watching the HN and producthunt front pages for certain keywords.
Basically, anytime I find myself checking a website more then a few times, I spend 20min making a Huginn Agent to do it for me.
I think one reason Huginn has worked well for me is that I don't try to make it do too much. I use it for scraping data and gentle filtering, and that's about it. It's been super helpful for alerting me to interesting content for The Orbital Index, my current project, a weekly space-industry newsletter. (Last issue: http://orbitalindex.com/archive/2019-12-10-Issue-42/)
I'm excited to check this out, but I wanted to congratulate you on a truly excellent project name. Having spent many, many hours in naming struggles, I truly appreciate the perfection.
And for those unfamiliar, the (also amazingly named) historian Snori Sturluson explains: "Two ravens sit on his (Odin’s) shoulders and whisper all the news which they see and hear into his ear; they are called Huginn and Muninn. He sends them out in the morning to fly around the whole world, and by breakfast they are back again. Thus, he finds out many new things and this is why he is called ‘raven-god’ (hrafnaguð)." [1]
Little known fact: this project is used widely by journalists who can't code (at The New York Times, among others) to do a variety of tasks, like eg. monitoring web pages like Trump's policy position, scraping press releases or filtering out very specific news alerts.
Great to see you at the top of HN today tectonic! I'm at Pivotal in Sydney Australia now. Thanks again for the impromptu interview all of those years ago!
- Process data through shell scripts or JavaScript
- Better filtering
- Liquid templating
- Completely private
I save thousands of dollars by using Huginn.
It's incredibly powerful and quite frankly I don't trust Zapier with my data and visibility of what I'm doing because there are commercial implications.
Run it in the cloud on AWS or on an old box at home.
Not disputing any of the other alternatives, but you can create custom integrations on Zapier. It's mostly intended for publishing them publicly, but you don't have to.
I have been running huginn on my home server for a while. I've mainly used it to filter RSS feeds and then generate new feeds with the filtered items. Another use-case for me is ingesting webcomic RSS feeds (or scrape a page) and post the comics to a private Telegram channel. Once I also had an agent scraping a page and notifying me if something changed (realestate listing).
I have tried a couple alternatives, e.g. node-red but none really worked the way I wanted them to for these cases. huginn is incredibly flexible and (at least for me) the mental model of it's workflow makes a lot of sense.
Sadly more and more pages want you to go through their app/site and make it a bit difficult to work with, e.g. getting content from an instagram account.
One thing I have not figured out about huginn and which all of these automation tools seem to lack are loops. E.g I have page an agent scrapes, from which I want to output the src of an image tag but I also want to check if a certain condition on the page matches (e.g. a "next page" button exists) and then firstly output the found src but then also re-invoke the agent with a new input element. So it would scrape the next page and so on until it does not find the button anymore.
Zapier has exponentially more integrations than this or anything else, but is surprisingly difficult to use, and more so since they updated their UI. Editing Zaps is pure torture because refresh is so difficult. Exception handling is pure pain. In most instances the breadth of API calls is so narrow, and so rarely updated by vendors that you end up switching to a custom integration. I've also noticed vendors rapidly expanding their native integrations, sidestepping the need for a request broker.
The "retail" integration space remains underserved and if one of the enterprise players decided to go down-market with a better UI and deeper integrations - they'd mop the floor clean in 18 months.
> The "retail" integration space remains underserved and if one of the enterprise players decided to go down-market with a better UI and deeper integrations - they'd mop the floor clean in 18 months.
I think it's a tough market to be "winner" of. Novices are going to want a stupid simple GUI ("wizard mode", as someone else in thread mentioned). Power users are going to want to be able to toss in some code at some point in a workflow to do some fancy ETL you don't support out of the box. When you hit a certain level of complexity, an edge case or integration an automation product doesn't support, or perhaps even an amount of spend that you start looking at annually as painful, it's likely you consider pulling all of your workflows out and have a software engineer build something bespoke for your business line.
> The "retail" integration space remains underserved
This is accurate, there's a significant opportunity in this space but it won't stay that way for long.
There's a well-known entrepreneur I know with significant exits and capital entering this space that's going after Zapier's market and I'm certain he's not alone.
"Once a day, ask 5 people for a funny cat photo; send the results to 5 more people to be rated; send the top-rated photo to 5 people for a funny caption; send to 5 final people to rate for funniest caption; finally, post the best captioned photo on my blog."
I'm still laughing :) wth! (My fear is that this might actually be sustainable with ads.)
The Airflow landing page that you linked to lists many integrations but when you click on those only a small subset of them are listed in the integrations section of the docs that is linked to. I guess the docs are in need of some more work.
This, as well as related projects like n8n & node-red, is a very cool project. I always wonder what people use it for in real life though. It seems a lot of trouble (setting up, learning curve, maintaining) for an action that usually takes a couple of seconds, like checking the weather or opening twitter.
For those using expensive/advanced connectors like Zapier, Tray.io, etc. I find that https://n8n.io serves as a far welcoming open-source alternative that is worth looking at.
Thanks a lot rvz for throwing it in the mix. I am the creator of n8n so just wanted to mention that it is not "OSI approved open-source" as the commons clause got attached. More information about that in the FAQ https://docs.n8n.io/#/faq?id=license
One use case for Zapier (from a developer/company standpoint) is to allow customers to connect their existing services to actions inside your own app. For instance, if a customer updates a CRM record, you can have a custom zap update a record in your own SaaS platform.
To pull that off with huginn, is that as simple as connecting this up to Singer.io? Or would that require a big marketplace of huginn agents for popular integrations?
Also checkout Node Red [0], it's fairly popular in the automation space. It's rather sparse by default, but after adding in some community nodes (or making some yourself) it's pretty useful.
I gave this a go today and managed to install huginn on my synology nas by simply searching for the docker container. I then setup 3 agents to scrape a Shopify webstore jason endpoint that I’m always checking for inventory, have huginn parse the json and send me an sms via twilio if inventory changes. Took about 2 hours, wasn’t too bad. Huginn twilio docs seemed dated.
Used python simplehttp server and ngrok to replicate a json url and play with the triggers to test it all before pointing it at a real website.
I've seen the word "rake", so I guess... is this written in Ruby? If so, how's the performance?
My home server is pretty minimal and lightweight: a Raspberry Pi. Do you think it will run it fine? (I'm gonna want to try this anyway, didn't know about it until now and it looks amazing!)
I’m glad Huginn is on the front page. It’s an awesome project being used for a long time now. I was testing Huginn to see how skimpy I can be and still run Huginn on a free tier. Was able to run it on open shift free tier a while ago when their allowance was generous. But, looks like it’s hard now. Will try running on a Gcp instance and see if it works.
Is there any way to implement agents in python rather than JavaScript/ruby? Looks interesting but I don’t want to invest energy in building fluency in these other scripting languages.
> Create Amazon Mechanical Turk workflows as the inputs, or outputs, of agents (the Amazon Turk Agent is called the "HumanTaskAgent"). For example: "Once a day, ask 5 people for a funny cat photo; send the results to 5 more people to be rated; send the top-rated photo to 5 people for a funny caption; send to 5 final people to rate for funniest caption; finally, post the best captioned photo on my blog."
[+] [-] tectonic|6 years ago|reply
Now that it's no longer a young project, here are some musings about Huginn and responses to people's comments in this thread, in no particular order.
I've found that Huginn excels as a scheduled web scraper with lightweight filtering. That's what I use it for. On the other hand, while you can write custom code in it, Huginn is pretty poor at implementing any sort of complex logic, and is even worse at bidirectional syncing between systems, which is something people often want it to do, but for which it wasn't designed.
If IFTTT or Zapier meet your needs, awesome! No need to run and monitor your own service. I personally choose to run Huginn on my own hardware in large part so that I'm comfortable giving it website cookies and passwords.
Some examples of what I use Huginn for these days:
- Watching Twitter in realtime for high standard deviation spikes in certain keywords, such as "san francisco emergency" or "san francisco tsunami warning", which then sends me a push notification, or "huginn open source", that goes to a digest email (and I imagine will trigger because of this thread).
- Watching Twitter for rare terms and sending me a digest of all tweets that match them. Also sending me all tweets from a few Twitter users who post rarely, but that I don't want to miss.
- Scraping a number of rarely updated blogs that don't have email newsletters and emailing me when they change. Some use RSS, most are just simple HTML scraping.
- Pulling deals from the frontpage and forums of slickdeals and craigslist and filtering them for certain keywords.
- Sending an early morning email if it's going to rain today.
- Watching ebay for some rare items.
- Sending my wife and me an email on Saturday morning with local yardsales from craigslist.
- Watching the HN and producthunt front pages for certain keywords.
Basically, anytime I find myself checking a website more then a few times, I spend 20min making a Huginn Agent to do it for me.
I think one reason Huginn has worked well for me is that I don't try to make it do too much. I use it for scraping data and gentle filtering, and that's about it. It's been super helpful for alerting me to interesting content for The Orbital Index, my current project, a weekly space-industry newsletter. (Last issue: http://orbitalindex.com/archive/2019-12-10-Issue-42/)
[+] [-] wpietri|6 years ago|reply
And for those unfamiliar, the (also amazingly named) historian Snori Sturluson explains: "Two ravens sit on his (Odin’s) shoulders and whisper all the news which they see and hear into his ear; they are called Huginn and Muninn. He sends them out in the morning to fly around the whole world, and by breakfast they are back again. Thus, he finds out many new things and this is why he is called ‘raven-god’ (hrafnaguð)." [1]
[1] https://norse-mythology.org/gods-and-creatures/others/hugin-...
[+] [-] PatrolX|6 years ago|reply
This, exactly and all of the above.
It really is a fantastic project and kudos to you for starting it.
[+] [-] vincvinc|6 years ago|reply
see Huginn for Newsrooms: http://albertsun.github.io/huginn-newsroom-scenarios/
It's been at least as useful as Yahoo! Pipes, and endlessly more reliable. Thanks a lot!
[+] [-] joegaebel|6 years ago|reply
[+] [-] Wistar|6 years ago|reply
[+] [-] jimsug|6 years ago|reply
This has been immensely useful to me, and yes, my main uses have been primarily web scraping and then piping it into various channels.
Been running it on a cheapish VM for a couple years, very reliable and lets you monitor things more frequently and reliably than services like IFTTT.
[+] [-] tomcooks|6 years ago|reply
You might just have fixed a couple of problems with your tool, thanks!
[+] [-] thisisbrians|6 years ago|reply
[+] [-] carrozo|6 years ago|reply
[+] [-] johnx123-up|6 years ago|reply
[+] [-] solstice|6 years ago|reply
[+] [-] babyyoda|6 years ago|reply
[+] [-] PatrolX|6 years ago|reply
- Self-hosted
- Unlimited and FREE vs $699 / month
- Create your own agents / integrations
- Process data through shell scripts or JavaScript
- Better filtering
- Liquid templating
- Completely private
I save thousands of dollars by using Huginn.
It's incredibly powerful and quite frankly I don't trust Zapier with my data and visibility of what I'm doing because there are commercial implications.
Run it in the cloud on AWS or on an old box at home.
It's very reliable.
[+] [-] falcor84|6 years ago|reply
https://platform.zapier.com/partners/lifecycle-planning
[+] [-] PopeDotNinja|6 years ago|reply
Does that take into account the overhead of operating Huginn?
[+] [-] traspler|6 years ago|reply
I have tried a couple alternatives, e.g. node-red but none really worked the way I wanted them to for these cases. huginn is incredibly flexible and (at least for me) the mental model of it's workflow makes a lot of sense. Sadly more and more pages want you to go through their app/site and make it a bit difficult to work with, e.g. getting content from an instagram account.
One thing I have not figured out about huginn and which all of these automation tools seem to lack are loops. E.g I have page an agent scrapes, from which I want to output the src of an image tag but I also want to check if a certain condition on the page matches (e.g. a "next page" button exists) and then firstly output the found src but then also re-invoke the agent with a new input element. So it would scrape the next page and so on until it does not find the button anymore.
[+] [-] PatrolX|6 years ago|reply
I do some very complex stuff using the "Shell Command Agent". You might want to look into using that if you haven't already.
You can also create your own agent gem https://github.com/huginn/huginn_agent
[+] [-] IanCal|6 years ago|reply
[+] [-] robk|6 years ago|reply
[+] [-] howmayiannoyyou|6 years ago|reply
The "retail" integration space remains underserved and if one of the enterprise players decided to go down-market with a better UI and deeper integrations - they'd mop the floor clean in 18 months.
[+] [-] toomuchtodo|6 years ago|reply
I think it's a tough market to be "winner" of. Novices are going to want a stupid simple GUI ("wizard mode", as someone else in thread mentioned). Power users are going to want to be able to toss in some code at some point in a workflow to do some fancy ETL you don't support out of the box. When you hit a certain level of complexity, an edge case or integration an automation product doesn't support, or perhaps even an amount of spend that you start looking at annually as painful, it's likely you consider pulling all of your workflows out and have a software engineer build something bespoke for your business line.
[+] [-] PatrolX|6 years ago|reply
This is accurate, there's a significant opportunity in this space but it won't stay that way for long.
There's a well-known entrepreneur I know with significant exits and capital entering this space that's going after Zapier's market and I'm certain he's not alone.
[+] [-] rogerkirkness|6 years ago|reply
[+] [-] david_draco|6 years ago|reply
I'm still laughing :) wth! (My fear is that this might actually be sustainable with ads.)
[+] [-] omarhaneef|6 years ago|reply
[+] [-] apeddle|6 years ago|reply
I enjoyed waking up every morning to an often strange political message posted by my "bot".
@CrowdWisdomBot if anyone is curious :)
Doing it with cat photos is far more clever.
[+] [-] antpls|6 years ago|reply
[+] [-] minimaxir|6 years ago|reply
Airflow has been a skill that many companies ask for (especially data engineering), but surprisingly doesn't have many articles written about it.
[+] [-] codetrotter|6 years ago|reply
[+] [-] esquire_900|6 years ago|reply
Does anybody have useful workflows going on?
[+] [-] bitshift|6 years ago|reply
1. https://github.com/automaticmode/active_workflow 2. https://github.com/automaticmode/active_workflow/blob/master...
[+] [-] tectonic|6 years ago|reply
[+] [-] gotts|6 years ago|reply
[+] [-] rvz|6 years ago|reply
[+] [-] janober|6 years ago|reply
[+] [-] agentdrtran|6 years ago|reply
[+] [-] PatrolX|6 years ago|reply
[+] [-] curo|6 years ago|reply
One use case for Zapier (from a developer/company standpoint) is to allow customers to connect their existing services to actions inside your own app. For instance, if a customer updates a CRM record, you can have a custom zap update a record in your own SaaS platform.
To pull that off with huginn, is that as simple as connecting this up to Singer.io? Or would that require a big marketplace of huginn agents for popular integrations?
[+] [-] anderspitman|6 years ago|reply
https://patchbay.pub/
[+] [-] penagwin|6 years ago|reply
https://nodered.org/
[+] [-] devm0de|6 years ago|reply
Used python simplehttp server and ngrok to replicate a json url and play with the triggers to test it all before pointing it at a real website.
Nice to add a new tool to the belt, thanks!
[+] [-] dang|6 years ago|reply
2014: https://news.ycombinator.com/item?id=7585605
2013: https://news.ycombinator.com/item?id=5377651
[+] [-] ksrm|6 years ago|reply
[+] [-] j1elo|6 years ago|reply
My home server is pretty minimal and lightweight: a Raspberry Pi. Do you think it will run it fine? (I'm gonna want to try this anyway, didn't know about it until now and it looks amazing!)
[+] [-] heavyset_go|6 years ago|reply
[+] [-] theshadowmonkey|6 years ago|reply
[+] [-] audiometry|6 years ago|reply
[+] [-] osprojects|6 years ago|reply
[+] [-] elwell|6 years ago|reply
Curious how this would turn out.