The right tool for the right job

Using data science tools to supercharge your developer toolkit

Glynn Bird
data science
© Shutterstock / igorstevanovic

What can developers learn from data scientists? What tools should they take a closer look at? Glynn Bird explains why developers should adopt two key tools from data science: Jupyter Notebooks and PixieDust.

Looking at Jupyter Notebooks and PixieDust, both data scientist’s tools, with a fresh pair of eyes – I got thinking. How can we, as developers, use these tools to their full potential to deliver great results for us? The single greatest stand out benefit for me was their iterative character, allowing developers to work seamlessly with living documents that are updated in real time. As a developer I’m used to writing code, hitting enter, waiting to see the results and repeating the process until I get the right result. But with notebooks and tools such as PixieDust I can get instant feedback, and infinitely speed up the development process.

Notebooks: a new approach

Notebooks have many uses. They can be used as scratchpads for coding, for algorithm development and developing proofs of concept to ensure you are going in the right direction, before building too much of the underlying scaffolding. Another key use I’ve found is for answering questions on Stack Overflow, where it allows me to quickly recreate the issue and then resolve it, creating a walkthroughin the notebook as I go.

Notebooks can be used for one-off things too, like moving data from one point to another, or for occasional scripting at monthly, quarterly or yearly intervals. It is also invaluable when creating interactive documents and tutorials as you can mix text with commentary, or coding with graphics – all within one document.

A key business use case for notebooks is for generating reports. Notebooks can populate reports with key information, like sales or like-for-like figures. PixieDust publishes these in a shareable format so that reports can be easily circulated within an organisation or beyond. As is often the case in businesses, multiple different versions of the same spreadsheet can create confusion through lack of version control. This is not a very collaborative way of working, and it makes little sense to re-share whole documents every time small updates are made.

In our developer advocacy team at IBM, we record our activity and log all statistics in a database. While the old approach would be to collate statistics in a spreadsheet, circulate it for amendments, and repeat this cycle, with notebooks everyone can access it at once and modify it collaboratively. Most importantly, the data is coming from the real underlying core dataset, minimizing the risk of being tampered with.

The latest version of PixieDust, an open source productivity tool for Jupyter Notebooks, allows you to visualize data in shareable individual charts. So now, rather than sharing the full document, it’s possible to publish a single chart and then share its URL with colleagues and partners. This allows them to see the real data, without exposing that data to potential loss of fidelity.

SEE MORE: Python’s growth comes from the enormous expansion of data science and machine learning

PixieDust: the magic ingredient for the many

With new developer tools, a new mind-set and approach to working can be necessary. Most of my working life, I’ve just been typing text into text files, but with notebooks I can now type into a more visual web interface.

Without PixieDust, if I wanted to visualize what I’m working on, I would have to effectively draw my own chart pixel by pixel, but PixieDust can easily turn what I’m working on into the type of charting tool I’d want to see on a spreadsheet. The result is much more visual and more time efficient, and let’s be honest – as a developer I want to spend my time building apps, not creating charts!

Using PixieDust reduces the number of chores that consume valuable development time and makes things simpler. Getting used to new tools can be a steep learning curve. For example, as the main programming language for notebooks is Python, if you’re not a Python developer you would need to learn that new language.

One of the things I took upon myself to do when I started using Pixie Dust, to avoid having to translate code in my head to Python from my natural programming language was to create PixieDust_Node. This is an add-on that allows Note.js data science tools to be used, as well as allowing developers to cod in the language they most prefer. This means there is no conversion or translation from one language to another needed, and I can still use the visualization tools to bring the data to life.

SEE MORE: 6 best places to learn data science fast

What’s next?

Spreadsheets are universally used but they can only get you so far, whereas notebooks are built for big data usage. Using a spreadsheet when you have 100,000 entries of data can be unwieldy and a liability to accurate recording of figures. I’ve put together a tutorial to get spreadsheet users started on Notebooks with PixieDust. PixieDust is constantly updated to ensure that it can deliver the best results and be as productive as possible with large sets of data. Emailing a notebook is far more sophisticated than the alternative, so get on-board and have a play with it.


Glynn Bird

Glynn Bird started his career in the research and development arm of the steel industry, creating sensors and control systems. He then became a web developer for a business directory company building CRM systems, search technology and automated telephony systems. He now works for IBM in the UK as a Developer Advocate. Follow him on Twitter @glynn_bird

Inline Feedbacks
View all comments