“The 1.0 release does not mean a conclusion, or even slowing down, of pandas’ development.”
pandas has reached the milestone version 1.0.0. The Python library for data analysis and manipulation has already been around for 12 years and is being used in production, so what led to this decision now? We spoke to Tom Augspurger from the pandas developer team. He shared some insights on the new release, his personal highlights and where pandas is headed in the future.
pandas 1.0.0 has been released on GitHub by Tom Augspurger, following a release candidate that arrived earlier this month. We interviewed Tom to find out more about the major upgrade to v0.25.3.
Let’s see what he has to say about the new
NA scalar, breaking changes and the upgrade process, and what features he is looking forward to implementing in the future.
JAXenter: You’ve made the decision to finally push pandas to version 1.0. What considerations played a part in this?
pandas.NA is a new concept in the scientific Python ecosystem, and it’s not clear how other libraries will adapt to handle it.
Tom: We started thinking about pandas 1.0 in earnest at our first developer sprint in July 2018. At the time, we optimistically targeted January 2019 (6 months from the sprint). We ended up needing another 18 months of development.
pandas has been “production ready” for a while now, in the sense that it’s used in production at many institutions. But we still had a few major items we wanted to iron out before calling 1.0:
- Clean up the API. We’d accumulated a large amount of deprecations for duplicative or unclear behavior. Many of these were enforced for 1.0.
- Stabilize the data model. Starting around pandas 0.23 (May 2018), we clarified exactly what kind of data could be stored in a Series or DataFrame. Historically, this was just NumPy arrays or a few “extension types” that pandas defined. The 0.23 release included an interface that specified what kind of array can be stored inside pandas, and over the subsequent releases we refined that interface.
With these stabilizations in place, we felt that a 1.0 was appropriate.
JAXenter: What is your personal highlight in pandas 1.0?
Tom: The new
NA scalar to represent scalar missing values. This is the value used to represent “missing” in our new nullable integer, boolean, and string data types. Historically, we’ve used
NaN (not a number), but that had several drawbacks. Most notably,
NaN is a float and so cannot be used with integer dtypes. And
NaN has some peculiar behavior in logical and comparison operations.
Historically, we’ve used NaN (not a number), but that had several drawbacks.
pandas.NA is a new concept in the scientific Python ecosystem, and it’s not clear how other libraries will adapt to handle it. We’re working with other libraries, including NumPy, to discover how we can best handle the concept of “missing data” across the ecosystem.
JAXenter: For developers who use pandas, what will be the most significant changes when upgrading?
Tom: All of our API breaking changes are documented in our release notes. This release had relatively minor breaking changes. The largest changes are probably to the (experimental) IntegerArray to now use the new
pandas.NA scalar value rather than
NaN. When upgrading, we always recommend:
- A careful read through of the release notes.
- Trying the release candidate.
We provide binaries for final releases and release candidates. Subscribe to our releases on GitHub by “watching” for releases.
Our full installation instructions are available here.
JAXenter: And, lastly, what are your future plans for pandas?
Personally, I’m excited about improvements to the extension array interface.
Tom: The 1.0 release does not mean a conclusion, or even slowing down, of pandas’ development. The roadmap always contains an up-to-date vision of the maintainer’s view for where the project should head.
Personally, I’m excited about improvements to the extension array interface, in particular the ability to use extension arrays to back Index objects for indexing. Interoperability with Apache Arrow, perhaps first via a native string array, is a promising area of development too.
And, as always, pandas’ development is guided primarily by its users and contributors. Anybody can join in the development process to influence where the project goes.
Thank you for the interview!