ML 101: The rules of machine learning
Want to learn machine learning? Or do you need to brush up on basic concepts? Today, we go over two essential reference materials for anyone just starting out on their machine learning adventure: the machine learning glossary and the rules of ML.
Things can be confusing if you’re just starting out on your machine learning journey. ML might be on the bleeding edge, but it can be hard for developers in different fields to catch up. However, the rewards for doing so are fairly substantial: we talk all the time about how well ML specialists are compensated for their skills.
So, what’s a developer to do if they want to level up their ML credentials? While you can always take a course or a boot camp, those can be expensive. We already went over some of the great open source options for machine learning, artificial intelligence, and more that are all available online. There’s a whole internet full of open source tools for machine learning, including OpenAI and TensorFlow.
Today, we’re taking a look at two useful tools from the Google Developers team: the Rules of ML and the Machine Learning Glossary. I highly recommend reading the whole rule book by Martin Zinkevich; it’s an incredible resource for anyone working on machine learning, whether they’re a beginner or just brushing up on their ML skills.
The Rules of Machine Learning
Machine learning is a pretty new discipline, so there really aren’t a whole lot of hard and fast rules. However, there are an awful lot of guidelines and helpful generalizations to follow.
“Do machine learning like the great engineer you are, not like the great machine learning expert you aren’t.”
Making things work in machine learning has a lot to do with engineering and less to do with algorithms. That’s not to say ML algorithms aren’t necessary and useful, it’s just that many of the problems that you as a developer will face will be solvable with a background in engineering or computer science.
Martin Zinkevich has a very basic approach to all ML problems:
- Make sure your pipeline is solid end to end.
- Start with a reasonable objective.
- Add common-sense features in a simple way.
- Make sure that your pipeline stays solid.
Following this general approach covers a lot of ground. Increasing complexity means you’re throwing up future roadblocks. Remember the golden rule of all development projects – keep it simple, stupid.
Three simple rules before you start your ML pipeline
Not to be outdone by a set of simple guidelines, they also give budding ML specialists three simple rules to follow before they even start out with machine learning. The rules carry on well into developing your first pipeline, feature engineering, and refining complex models, but we’re only going to focus on the foundation today.
Rule #1: Don’t be afraid to launch a product without machine learning.
Do you need machine learning? Do you really, really need it? Sure, ML is super cool and extremely topical in tech right now, but don’t let it become a solution in search of a problem. ML has very defined parameters of success; it might not work out for what your project needs.
Besides, by definition, ML needs an awful lot of data. You might not have access to the right sort of datasets, or even access to any datasets.
Rule #2: Design and implement metrics.
Metrics are important. Without any kind of measuring stick, how can you tell if your project is working? How can you determine if there are any problems?
This is where data collection comes into play. When you’re designing a project, see if there are ways to gather data from the start, if only because it’s easier to get permission from users from the get-go. Having a wealth of historical data makes it easier to prove if that one initiative or tweak to the system actually did anything.
Now would also be a good time to invest in a decent storage system for all that data you’ll be collecting.
Rule #3: Choose machine learning over complex heuristics.
A heuristic is the way any approach to problem solving. So, simple heuristics are easy to implement; complex ones less so. Machine learning is easier to update than a complex heuristic.
Not to be outdone, the Google Development team has also released a comprehensive Machine Learning Glossary. Terminology in technology is complex; let’s simplify things with a very helpful reference sheet that clearly explains what we mean by cross-entropy, one-hot encoding, or a softmax.
Frankly, I find this to be incredibly useful, if only because there are a lot of overlapping terms in computer science. Clarity is crucial to writing clean code. Writing clean code isn’t just efficient; it helps out future developers who follow in your footsteps.
Machine learning may be difficult, but there are a lot of options out there to make it easier for anyone just starting out. These tools from the Google Development team are incredibly useful for beginners as well as anyone looking to brush up on their ML skills.
Remember, keep it simple and good machine learning comes from good engineers. You can do it!