top | item 37496618

(no title)

otsaloma | 2 years ago

np.nan is only for floats, doesn't help with integer, boolean, string etc. Also, datetimes have NaT, but it's troublesome to e.g. do different checks np.isnan() or np.isnat() depending in the data type. And we don't even have np.nat, but need np.datetime64("NaT"), so it's just confusing.

discuss

order

sheepshear|2 years ago

Why not use a structured array with an 'isna' field to use as a mask when performing operations?

otsaloma|2 years ago

How is that convenient? Missing data support belongs deep in NumPy itself (or any other similar package) so that operations can do the right thing and missing values propagate correctly. For example, let's say you want by definition missing values to sort last. If you roll out your custom missing value marker, you'll also need to roll out your own custom sort function. And the same for a whole lot more stuff.