top | item 40699470

NumPy 2.0

324 points| scoresmoke | 1 year ago |numpy.org

76 comments

order

dahart|1 year ago

The thing I want most is a more sane and more memorable way to compose non-element-wise operations. There are so many different ways to build views and multiply arrays that I can’t remember them and never know which to use, and have to relearn them every time I use numpy… broadcasting, padding, repeating, slicing, stacking, transposing, outers, inners, dots of all sorts, and half the stack overflow answers lead to the most confusing pickaxe of all: einsum. Am I alone? I love numpy, but every time I reach for it I somehow get stuck for hours on what ought to be really simple indexing problems.

salamo|1 year ago

When I started out I was basically stumbling around for code that worked. Things got a lot easier for me once I sat down and actually understood broadcasting.

The rules are: 1) scalars always broadcast, 2) if one vector has fewer dimensions, left pad it with 1s and 3) starting from the right, check dimension compatibility, where compatibility means the dimensions are equal or one of them is 1. Example: np.ones((2,3,1)) * np.ones((1,4)) = np.ones((2,3,4))

Once your dimensions are correct, it's a lot easier to reason your way through a problem, similar to how basic dimensional analysis in physics can verify your answer makes some sense.

(I would disable broadcasting if I could, since it has caused way too many silent bugs in my experience. JAX can, but I don't feel like learning another library to do this.)

Once I understood broadcasting, it was a lot easier to practice vectorizing basic algorithms.

akasakahakada|1 year ago

To be honest einsum is the easiest one. You get fine control on which axis matmul to which. But I wish it can do more than matmul.

The others are just messy shit. Like you got np.abs but no arr.abs, np.unique but no arr.unique. But now you have arr.mean.

Sometimes you got argument name index, sometimes indices, sometimes accept (list, tuple), sometime only tuple.

hansvm|1 year ago

It gets more comfortable over time, but I remember feeling that way for the first year or three. My wishlist now is for most of numpy to just be a really great einsum implementation, along with a few analogous operations for the rest of the map-reduces numpy accelerates.

I've been writing my own low-level numeric routines lately, so I'm not up-to-date on the latest news, but there have been a few ideas floating around over the last few years about naming your axes and defining operations in terms of those names [0,1,2,3]. That sort of thing looks promising to me, and one of those projects might be a better conceptual fit for you.

[0] https://nlp.seas.harvard.edu/NamedTensor

[1] https://pypi.org/project/named-arrays/

[2] https://pytorch.org/docs/stable/named_tensor.html

[3] https://docs.xarray.dev/en/stable/

nl|1 year ago

ChatGPT is really good at this. Using it to solve numpy and matplotlib problems is worth the cost of the subscription.

mFixman|1 year ago

Most of the bugs I got on numpy programs came from a variable with a different ndims as expected being broadcast implicitly.

Implicit type casting is considered a mistake in most programming languages; if I were to redesign numpy from scratch I would make all broadcasting explicit.

My solution to these problems is asserting an array's shape often. Does anybody know is there's a tool like mypy or valgrind, but that checks mismatched array shapes rather than types or memory leaks?

SubiculumCode|1 year ago

Is this the dplyR use case in R? Is there dplyPython for Numpy?

begueradj|1 year ago

What do you mean by: "more memorable way to ..." ?

ayhanfuat|1 year ago

> The default integer type on Windows is now int64 rather than int32, matching the behavior on other platforms

This was a footgun due to C long being int32 in win64. Glad that they changed it.

fbdab103|1 year ago

Any notable highlights for a consumer of Numpy who rarely interfaces directly with it? Most of my work is pandas+scipy, with occasionally dropping into the specific numpy algorithm when required.

I am much more of an "upgrade when there is a X.1" release kind of guy, so my hat off to those who will bravely be testing the version on my behalf.

nerdponx|1 year ago

As a more or less daily user, I was surprised at how not-breaking the 2.0 changes will be for 90% of Numpy users. Unless their dependencies/environments break, I expect that casual users won't even notice the upgrade.

Even the new string dtype I expect would go unnoticed by half of users or more, because they won't be using it (because Numpy historically only had fixed-length strings and generally poor support for them) and so won't even think to try it. Pandas meanwhile has had a proper string dtype for a while, so anyone interested in doing serious work on strings in data frames / arrays would presumably be using Pandas anyway.

Most of the breaking changes are in long-deprecated oddball functions that I literally have never seen used in the wild, and in the internal parts that will be a headache for library developers.

The only change that a casual user might actually notice is the change in repr(np.float64(3.0)), from "3.0" to "np.float64(3.0)".

notatoad|1 year ago

it feels like the first major release in 18 years which introduces lots of breaking changes should just be a fork rather than a version.

let me do `pip install numpy2` and not have to worry about whether or not some other library in my project requires numpy<2.

vessenes|1 year ago

From a consumer (developer consumer) point of view, I hear you.

From a project point of view, there are some pretty strong contra-indicators in the last 20 years of language development that make this plan suspect, or at least pretty scary — both Perl and Python had extremely rocky transitions around major versions; Perl’s ultimately failing and Python’s ultimately taking like 10 years. At least. I think the last time I needed Python 2 for something was a few months ago, and before that it had been a year or so. I’ve never needed Perl 6, but if I did I would be forced to read a lot of history while I downloaded and figured out which, if any, Perl 5 modules I’m looking for got ported.

I’d imagine the numpy devs probably don’t have the resources to support what would certainly become two competing forks that each have communities with their own needs.

patrick451|1 year ago

A project with both numpy arrays and numpy2 arrays getting mixed together sounds like a disaster to me.

make3|1 year ago

knowing how careful the NumPy devs are, this was likely a very well pondered decision & all of these deprecations likely have been announced for a long time. Seeing knee jerk reactions like this is annoying.

aphexairlines|1 year ago

If your requirements.in referenced numpy before this release, then doesn't your requirements.txt already reference a specific 1.x version?

theamk|1 year ago

That was my first reaction as well, but apparently as far a Python goes, numpy 2 code is fully compatible with numpy 1 code [0], with exception of single "byte_bounds" function (which sounds super rare, so I doubt it'd be a problem)

So at least the migration path for python modules is clear: upgrade to be numpy 2 compatible, wait for critical mass, start adding numpy 2 features. Sounds way better than python2 -> python3 migration, for example.

However, the fact that I had to look at 3rd party page to find this out is IMHO a big documentation problem. It should be plastered in all announcements, on documentation and migration page: "there is a common subset for python code of numpy 1 and 2, so you can upgrade now, no need to wait for full adoption"

[0] https://docs.astral.sh/ruff/rules/numpy2-deprecation/

RandomBK|1 year ago

I'm starting to see some packages break due to not pinning 1.x in their dependencies. `pip install numpy==1.*` is a quick and hacky way to work around those issues until the ecosystem catches up.

globular-toast|1 year ago

"numpy~=1.0”

Is this not common knowledge? Also, pip install? Or do you mean some requirements file?

antonoo|1 year ago

> X months of work by Y contributors?

Makes it look like they pressed publish before filling in their template, or is this on purpose?

tpoacher|1 year ago

I wish numpy pushed their structured arrays (and thereby also improvements to their interface) more aggressively.

Most people are simply unaware of them, which is why we get stuff like pandas on top of everything.

fertrevino|1 year ago

So apparently this is what broke my CI job since it was indirectly installed. One of the downsides of using loose version locking with requirements.txt rather than something like poetry I guess.

Kalanos|1 year ago

What are the implications of the new stringdtype? If I remember correctly, string performance was a big part of the pandas switch to arrow.

darepublic|1 year ago

I would love for numpy to be ported as a typescript project personally. So I can do ml in ts. The python ecosystem feels a bit insane to me (more so than the js one). Venv helps but is still inferior to a half decent npm project imo. I feel there is no strict reason why this migration couldn't happen, only the inertia that makes it unlikely

emmanueloga_|1 year ago

I think it is "better" to go with the flow. I can't see TS competing for the data analysis niche any time soon!

Maybe try Pixi? [1] Python programming enjoyability really increased for me after using Pixi for dependencies, VSCode+Pylance [2] for editing, and Ruff [3] for formatting.

Pixi can install both python and dependencies _per project_. Then, I add this to .vscode/settings.json:

    {
      "python.analysis.typeCheckingMode": "strict", 
      "python.defaultInterpreterPath": ".pixi/envs/default/bin/python3"
    }
and I'm all set!

--

1: https://pixi.sh

2: https://github.com/microsoft/pylance-release#readme

3: https://docs.astral.sh/ruff/