top | item 27442888

(no title)

sivoais | 4 years ago

Hi, PDL core dev here. Feel free to ask me anything about it.

The last release wasn't in February, it was just last week! <https://metacpan.org/release/ETJ/PDL-2.050>.

I agree with many of the commenters here that Python has a lot of great libraries and is a major player for scientific computing these days. I also code in Python from time to time, but I prefer the OO modelling and language flexibility features of Perl.

Speaking for myself and not the other PDL devs, I don't think this is an issue for Perl-using scientists as Perl can actually call Python code quite easily using Inline::Python. In the future I will be working on interoperability between the two better specifically for NumPy / Pandas. This is also the path being taken by Julia and R.

discuss

enriquto|4 years ago

Looks great! I used perl a lot when I started programming and it is lovely to see it alive and kicking with scientific computing!

As a "heavy" user of scientific computing, I must say that the name "data language" is a bit disheartening... It echoes of useless "data frames" not of cool "sparse matrices" which is what I actually need. Does PDS support large sparse matrices? I grepped around the tutorial and the book and the word "sparse" is nowhere to be found. Yet it is an essential data structure in scientific computation. Are there any plans to, e.g., provide an interface into standard libraries like suitesparse?

sivoais|4 years ago

I plan to improve that, but will need to figure out the design (perhaps with something from Eigen). There is <https://metacpan.org/pod/PDL::CCS>, but it is not a real full PDL ndarray and is actually a wrapper around the PDL API.

1996|4 years ago

Very interesting, thank you!

Do you have a tutorial and some examples? If not, could you write one?

I sometimes deploy perl code at large scale for financial computing where only performance matters: with XS the overhead is low while gaining language flexibility.

Even in 2021, this is usually faster than alternatives by orders of magnitude.

PDL could be a good addition to our toolset for specific workloads.

sivoais|4 years ago

Here is a link to the PDL book <http://pdl.perl.org/content/pdl-book-toc.html>.

I can share some examples of using PDL:

- Demos of basic usage <https://metacpan.org/release/ETJ/PDL-2.050/source/Demos/Gene...>

- Image analysis <https://nbviewer.ipython.org/github/zmughal/zmughal-iperl-no...> (I am also the author of IPerl, so if you have questions about it, let me know. My top priority with IPerl right now is to make it easy to install.)

- Physics calculations <https://github.com/wlmb/Photonic>

- Access to GSL functions for integration and statistics (with comparisons to SciPy and R): <https://gist.github.com/zmughal/fd79961a166d653a7316aef2f010...>. Note how PDL can take an array of values as input (which gets promoted into a PDL of type double) and then returns a PDL of type double of the same size. The values of that original array are processed entirely in C once they get converted to a PDL.

- Example of using Gnuplot <https://github.com/PDLPorters/PDL-Graphics-Gnuplot/blob/mast...>.

---

Just to give a summary of how PDL works relative to XS:

PDL allows for creating numeric ndarrays of any number of dimension of a specific type (e.g., byte, float, double, complex double) that can be operated on by generalized functions. These functions are compiled using a DSL called PP that generates multiple XS functions by taking a signature that defines the number of dimensions that the function operates over for each input/output variable and adding loops around it. These loops are quite flexible and can be made to work in-place so that no temporary arrays are created (also allows for doing pre-allocation). The loops will run multiple times over that same piece of memory --- this is still fast unless you have many small computations.

And if you do have many small computations, the PP DSL is available for the user to use as well so if they need to take a specific PDL computation written in Perl, they can translate the innermost loop into C and then it can do the whole computation in one loop (a faster data access pattern). There is a book for that as well called "Practical Magick with C, PDL, and PDL::PP -- a guide to compiled add-ons for PDL" <https://arxiv.org/abs/1702.07753>.

---

I'm also active on the `#pdl` IRC channel on <https://www.irc.perl.org/>, so feel free to drop by.

zengargoyle|4 years ago

Now you just need to port it to Raku. (Maybe you have).

sivoais|4 years ago

I would really like to do some scientific computing in Raku. It has crossed my mind that I can maintain both Perl5 and Raku ports of some of the library code I'm writing. I just haven't worked through the tooling.

audit|4 years ago

Thank you for your work. I used PDL early 2000 when working in bioinformatics area.

I did not know at the time any of the specialized languages, so intially approaching the project -- I was very concerned on how to deal with matrices, but as I got to understand the PDL better -- i was getting better and better at it.

If I may suggest someting (this is based on the old experience though) --

a) some 'built-in' way to seamlessly distribute work across processes and machines.

b) some seamless excel and libreoffice calc integration.

Meaning that I should be able to 'release' my programs as Excel/Libre Office files.

Where I code in PDL but leverage Spreadsheet as a 'UI' + calc runtime.

So that when I run my 'make' I get out a Excel/Libre office file that I can version and distribute into user or subsequent compute environments.

Where the PDL code is translated into the runtime understood by the spreadsheet engine.

I know this is a lot to ask, and may be not in the direction you are going, but wanted to mention still.

sivoais|4 years ago

Good ideas!

A built-in way would be good. There is some work being explored in using OpenMP with Perl/PDL to get some of that. In the mean time, there is MCE which does distribute across processes and there are examples of using this with PDL <https://github.com/marioroy/mce-cookbook#sharing-perl-data-l...>, but I have not had an opportunity to use it.

Output for a spreadsheet would be difficult if I understand the problem correctly. This would more about creating a mapping of PDL function names to spreadsheet function names --- not all PDL functions exist in spreadsheet languages. It might be possible to embed or do IPC with a Perl interpreter like <https://www.pyxll.com/>, but I don't know about how easy that would be to deploy when distributing to users.

Am I understanding correctly?

Interestingly enough, creating a mapping of PDL functions would be useful for other reasons, so the first part might be possible, but the code might need to be written in a certain way that makes writing the dataflow between cells easier.