top | item 21772610

Huginn: Create agents that monitor and act on your behalf

1303 points| daolf | 6 years ago |github.com | reply

143 comments

order
[+] tectonic|6 years ago|reply
It's fun to wake up to find a project that I started on the top of HN! These days, I'm no longer very involved with the day-to-day of the project.

Now that it's no longer a young project, here are some musings about Huginn and responses to people's comments in this thread, in no particular order.

I've found that Huginn excels as a scheduled web scraper with lightweight filtering. That's what I use it for. On the other hand, while you can write custom code in it, Huginn is pretty poor at implementing any sort of complex logic, and is even worse at bidirectional syncing between systems, which is something people often want it to do, but for which it wasn't designed.

If IFTTT or Zapier meet your needs, awesome! No need to run and monitor your own service. I personally choose to run Huginn on my own hardware in large part so that I'm comfortable giving it website cookies and passwords.

Some examples of what I use Huginn for these days:

- Watching Twitter in realtime for high standard deviation spikes in certain keywords, such as "san francisco emergency" or "san francisco tsunami warning", which then sends me a push notification, or "huginn open source", that goes to a digest email (and I imagine will trigger because of this thread).

- Watching Twitter for rare terms and sending me a digest of all tweets that match them. Also sending me all tweets from a few Twitter users who post rarely, but that I don't want to miss.

- Scraping a number of rarely updated blogs that don't have email newsletters and emailing me when they change. Some use RSS, most are just simple HTML scraping.

- Pulling deals from the frontpage and forums of slickdeals and craigslist and filtering them for certain keywords.

- Sending an early morning email if it's going to rain today.

- Watching ebay for some rare items.

- Sending my wife and me an email on Saturday morning with local yardsales from craigslist.

- Watching the HN and producthunt front pages for certain keywords.

Basically, anytime I find myself checking a website more then a few times, I spend 20min making a Huginn Agent to do it for me.

I think one reason Huginn has worked well for me is that I don't try to make it do too much. I use it for scraping data and gentle filtering, and that's about it. It's been super helpful for alerting me to interesting content for The Orbital Index, my current project, a weekly space-industry newsletter. (Last issue: http://orbitalindex.com/archive/2019-12-10-Issue-42/)

[+] wpietri|6 years ago|reply
I'm excited to check this out, but I wanted to congratulate you on a truly excellent project name. Having spent many, many hours in naming struggles, I truly appreciate the perfection.

And for those unfamiliar, the (also amazingly named) historian Snori Sturluson explains: "Two ravens sit on his (Odin’s) shoulders and whisper all the news which they see and hear into his ear; they are called Huginn and Muninn. He sends them out in the morning to fly around the whole world, and by breakfast they are back again. Thus, he finds out many new things and this is why he is called ‘raven-god’ (hrafnaguð)." [1]

[1] https://norse-mythology.org/gods-and-creatures/others/hugin-...

[+] PatrolX|6 years ago|reply
>I personally choose to run Huginn on my own hardware in large part so that I'm comfortable giving it website cookies and passwords.

This, exactly and all of the above.

It really is a fantastic project and kudos to you for starting it.

[+] vincvinc|6 years ago|reply
Little known fact: this project is used widely by journalists who can't code (at The New York Times, among others) to do a variety of tasks, like eg. monitoring web pages like Trump's policy position, scraping press releases or filtering out very specific news alerts.

see Huginn for Newsrooms: http://albertsun.github.io/huginn-newsroom-scenarios/

It's been at least as useful as Yahoo! Pipes, and endlessly more reliable. Thanks a lot!

[+] joegaebel|6 years ago|reply
Great to see you at the top of HN today tectonic! I'm at Pivotal in Sydney Australia now. Thanks again for the impromptu interview all of those years ago!
[+] Wistar|6 years ago|reply
You, Sir, are why I read HN. It's as simple as that.
[+] jimsug|6 years ago|reply
Firstly, thanks!

This has been immensely useful to me, and yes, my main uses have been primarily web scraping and then piping it into various channels.

Been running it on a cheapish VM for a couple years, very reliable and lets you monitor things more frequently and reliably than services like IFTTT.

[+] tomcooks|6 years ago|reply
Nothing better than this kind of explanations to get convinced to try something, genuine passion seeps through your words.

You might just have fixed a couple of problems with your tool, thanks!

[+] thisisbrians|6 years ago|reply
What funny timing! I'm a fellow space-nerd and just signed up for your newsletter a couple of weeks ago — it's fabulously done! Bravo.
[+] carrozo|6 years ago|reply
Love your newsletter! Brilliant writing and I think I click on more of its links than most of the many I receive.
[+] solstice|6 years ago|reply
Thank you for making Orbital Index! Super interesting and well written!
[+] babyyoda|6 years ago|reply
Awesome project! Super impressive and will check out later
[+] PatrolX|6 years ago|reply
Not just an alternative though.

- Self-hosted

- Unlimited and FREE vs $699 / month

- Create your own agents / integrations

- Process data through shell scripts or JavaScript

- Better filtering

- Liquid templating

- Completely private

I save thousands of dollars by using Huginn.

It's incredibly powerful and quite frankly I don't trust Zapier with my data and visibility of what I'm doing because there are commercial implications.

Run it in the cloud on AWS or on an old box at home.

It's very reliable.

[+] PopeDotNinja|6 years ago|reply
> I save thousands of dollars by using Huginn.

Does that take into account the overhead of operating Huginn?

[+] traspler|6 years ago|reply
I have been running huginn on my home server for a while. I've mainly used it to filter RSS feeds and then generate new feeds with the filtered items. Another use-case for me is ingesting webcomic RSS feeds (or scrape a page) and post the comics to a private Telegram channel. Once I also had an agent scraping a page and notifying me if something changed (realestate listing).

I have tried a couple alternatives, e.g. node-red but none really worked the way I wanted them to for these cases. huginn is incredibly flexible and (at least for me) the mental model of it's workflow makes a lot of sense. Sadly more and more pages want you to go through their app/site and make it a bit difficult to work with, e.g. getting content from an instagram account.

One thing I have not figured out about huginn and which all of these automation tools seem to lack are loops. E.g I have page an agent scrapes, from which I want to output the src of an image tag but I also want to check if a certain condition on the page matches (e.g. a "next page" button exists) and then firstly output the found src but then also re-invoke the agent with a new input element. So it would scrape the next page and so on until it does not find the button anymore.

[+] PatrolX|6 years ago|reply
> One thing I have not figured out about huginn and which all of these automation tools seem to lack are loops.

I do some very complex stuff using the "Shell Command Agent". You might want to look into using that if you haven't already.

You can also create your own agent gem https://github.com/huginn/huginn_agent

[+] IanCal|6 years ago|reply
You could perhaps use a tool like scrapy in combination with huginn. Scrapy has very simple patterns for doing that kind of logic.
[+] robk|6 years ago|reply
Yeah I just use the Javascript agent for v complex stuff like this.
[+] howmayiannoyyou|6 years ago|reply
Zapier has exponentially more integrations than this or anything else, but is surprisingly difficult to use, and more so since they updated their UI. Editing Zaps is pure torture because refresh is so difficult. Exception handling is pure pain. In most instances the breadth of API calls is so narrow, and so rarely updated by vendors that you end up switching to a custom integration. I've also noticed vendors rapidly expanding their native integrations, sidestepping the need for a request broker.

The "retail" integration space remains underserved and if one of the enterprise players decided to go down-market with a better UI and deeper integrations - they'd mop the floor clean in 18 months.

[+] toomuchtodo|6 years ago|reply
> The "retail" integration space remains underserved and if one of the enterprise players decided to go down-market with a better UI and deeper integrations - they'd mop the floor clean in 18 months.

I think it's a tough market to be "winner" of. Novices are going to want a stupid simple GUI ("wizard mode", as someone else in thread mentioned). Power users are going to want to be able to toss in some code at some point in a workflow to do some fancy ETL you don't support out of the box. When you hit a certain level of complexity, an edge case or integration an automation product doesn't support, or perhaps even an amount of spend that you start looking at annually as painful, it's likely you consider pulling all of your workflows out and have a software engineer build something bespoke for your business line.

[+] PatrolX|6 years ago|reply
> The "retail" integration space remains underserved

This is accurate, there's a significant opportunity in this space but it won't stay that way for long.

There's a well-known entrepreneur I know with significant exits and capital entering this space that's going after Zapier's market and I'm certain he's not alone.

[+] rogerkirkness|6 years ago|reply
We tend to agree re: retail integration, assuming you mean among retailers as opposed to message brokering.
[+] david_draco|6 years ago|reply
"Once a day, ask 5 people for a funny cat photo; send the results to 5 more people to be rated; send the top-rated photo to 5 people for a funny caption; send to 5 final people to rate for funniest caption; finally, post the best captioned photo on my blog."

I'm still laughing :) wth! (My fear is that this might actually be sustainable with ads.)

[+] omarhaneef|6 years ago|reply
This is basically the plan of The New Yorker, but with (probably) 1000s of people.
[+] apeddle|6 years ago|reply
For about a week a few years back, I ran a twitter bot that did essentially this for tweets powered by mechanical turk.

I enjoyed waking up every morning to an often strange political message posted by my "bot".

@CrowdWisdomBot if anyone is curious :)

Doing it with cat photos is far more clever.

[+] antpls|6 years ago|reply
I agree, that was an eye-opener example
[+] minimaxir|6 years ago|reply
For a more business-friendly-but-still-FOSS approach to task automation between services, see also Apache Airflow: https://airflow.apache.org

Airflow has been a skill that many companies ask for (especially data engineering), but surprisingly doesn't have many articles written about it.

[+] codetrotter|6 years ago|reply
The Airflow landing page that you linked to lists many integrations but when you click on those only a small subset of them are listed in the integrations section of the docs that is linked to. I guess the docs are in need of some more work.
[+] esquire_900|6 years ago|reply
This, as well as related projects like n8n & node-red, is a very cool project. I always wonder what people use it for in real life though. It seems a lot of trouble (setting up, learning curve, maintaining) for an action that usually takes a couple of seconds, like checking the weather or opening twitter.

Does anybody have useful workflows going on?

[+] rvz|6 years ago|reply
For those using expensive/advanced connectors like Zapier, Tray.io, etc. I find that https://n8n.io serves as a far welcoming open-source alternative that is worth looking at.
[+] janober|6 years ago|reply
Thanks a lot rvz for throwing it in the mix. I am the creator of n8n so just wanted to mention that it is not "OSI approved open-source" as the commons clause got attached. More information about that in the FAQ https://docs.n8n.io/#/faq?id=license
[+] agentdrtran|6 years ago|reply
This is nowhere near as robust as Zapier and has significant limitations.
[+] PatrolX|6 years ago|reply
Nowhere near as flexible and powerful as Huginn.
[+] curo|6 years ago|reply
Something I'm not entirely clear on:

One use case for Zapier (from a developer/company standpoint) is to allow customers to connect their existing services to actions inside your own app. For instance, if a customer updates a CRM record, you can have a custom zap update a record in your own SaaS platform.

To pull that off with huginn, is that as simple as connecting this up to Singer.io? Or would that require a big marketplace of huginn agents for popular integrations?

[+] penagwin|6 years ago|reply
Also checkout Node Red [0], it's fairly popular in the automation space. It's rather sparse by default, but after adding in some community nodes (or making some yourself) it's pretty useful.

https://nodered.org/

[+] devm0de|6 years ago|reply
I gave this a go today and managed to install huginn on my synology nas by simply searching for the docker container. I then setup 3 agents to scrape a Shopify webstore jason endpoint that I’m always checking for inventory, have huginn parse the json and send me an sms via twilio if inventory changes. Took about 2 hours, wasn’t too bad. Huginn twilio docs seemed dated.

Used python simplehttp server and ngrok to replicate a json url and play with the triggers to test it all before pointing it at a real website.

Nice to add a new tool to the belt, thanks!

[+] j1elo|6 years ago|reply
I've seen the word "rake", so I guess... is this written in Ruby? If so, how's the performance?

My home server is pretty minimal and lightweight: a Raspberry Pi. Do you think it will run it fine? (I'm gonna want to try this anyway, didn't know about it until now and it looks amazing!)

[+] heavyset_go|6 years ago|reply
A fresh Huginn container idles at ~300MB of memory without any configuration on my part. A fresh node-red container idles at 30MB.
[+] theshadowmonkey|6 years ago|reply
I’m glad Huginn is on the front page. It’s an awesome project being used for a long time now. I was testing Huginn to see how skimpy I can be and still run Huginn on a free tier. Was able to run it on open shift free tier a while ago when their allowance was generous. But, looks like it’s hard now. Will try running on a Gcp instance and see if it works.
[+] audiometry|6 years ago|reply
Is there any way to implement agents in python rather than JavaScript/ruby? Looks interesting but I don’t want to invest energy in building fluency in these other scripting languages.
[+] elwell|6 years ago|reply
> Create Amazon Mechanical Turk workflows as the inputs, or outputs, of agents (the Amazon Turk Agent is called the "HumanTaskAgent"). For example: "Once a day, ask 5 people for a funny cat photo; send the results to 5 more people to be rated; send the top-rated photo to 5 people for a funny caption; send to 5 final people to rate for funniest caption; finally, post the best captioned photo on my blog."

Curious how this would turn out.