rdorgueil's comments

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

Today, as a default, multithreading. But that's an implementation detail. Actually, Bonobo does not support coroutines (as in asyncio coroutines) so it would be a lie to market it this way. The plan though is to allow to use coroutines/futures in the future, for specific reasons (like long running/blocking operations where keeping output order tied to input order is of no importance). Still, there is a lot on the roadmap before this becomes a priority.

I note that I still have a lot of work explaining in simple terms what is actually bonobo, without falling in the trap of "overgeneral description".

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

As soon as I can, I'll include comparison pages to the documentation, trying to keep it as objective as possible. I can't seriously answer this question in depth here, but it is planned, so at least experts from other systems can also jump in and complement/correct my understanding of each systems. I used a bunch of them, but I'm in no mean expert user of each so making it collaborative sound like a better idea than just giving my point of view.

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

Bonobo runs each functions in the pipeline in parallel and make the fifo queues plumbing and thread pool management completely transparent.

The TLDR would then be "Write some generators or functions, link them in a graph, and call them in order on each line of data as soon as the previous transformation node output is ready.". For example if you have a database cursor that yields each line of a query as its output, it starts to run the next step(s) in the graph as soon as the first result is ready (yet not stop yielding from database until the graph is done for the current row). I did not find it easy to do with the libraries I tried.

The docs clearly lacks completion to say the least, and would need an example with a big dataset, one with long individual operations and one with a non linear graph, so it's more obvious that, of course, it's not made to process strings to uppercase twice in a row.

Stay tuned, I'm very happy HN brought it to homepage, did not really think it could happen at this stage though and I understand you. But that's a good thing for the project to move forward.

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

Yes, hackernews and twitter brutally told me I should take animal reign culture classes asap ...

This being said, if any of you have a good picture of bonobos that I can use instead of the current one, I'd be really glad to replace it! It needs to be released under a free license, though.

Thanks HN

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

Me (as an individual), and a few great people that helped me along the way. Not commercially endorsed, or supported.

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

It didn't sound harsh at all. I'm really laughing a lot right now about how ignorant I am about apes and monkeys. ^^

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

No, I don't have real-life public code available. I'm gonna see what I can extract from old commercial project for publication, but I can't guarantee anything.

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

You're very right, as I'm using both pandas and bonobo for different reasons.

Mostly, when I want a quasi-mathematical look over a dataset, pandas is my tool of choice. For all those data pipeline things that reasonably fit on one computer, I do use bonobo.

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

With the ancestor of bonobo, I was processing 5M lines of data in around 1 hour, including extraction, joins, api calls and a few loads. That should give a first info about the size target.

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

Currently realizing that we only have one word in french for both ape and monkeys ...

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

Short answer : parralel execution.

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

It's indeed intended for «small data», by opposition to «big data». I know, that does not say much, but I basically wanted to handle small flux of data without having to install the "big weapons".

I'm preparing explanation pages for a lot of the questions I got, including comparisons, volumes of data, where it is good and where it is not ...

All that will be well ready before 1.0, but for now, we're at 0.2 ...

Thanks for all the hackerlove, though!

rdorgueil | 9 years ago | on: Bonobo – A data processing toolkit for Python 3.5+

Noted, sorry for that. I'll get more infos about bonobos.