days
0
-79
-3
hours
-2
-1
minutes
-2
-8
seconds
-1
-8
search
Sampling your own product

Why Companies Should Be “Dogfooding” Their Own Software

Adam McKerlie
© Shutterstock / jossnat

Software companies of all types and sizes have the potential to reap the rewards of dogfooding. This article dives into some examples of how Sentry benefited from this approach. Learn about dogfooding a performance tool and how it helped resolve performance issues and improve efficiency.

For software companies, the practice of sampling your own product before the public does can be an essential source of insight. This approach, known as “Dogfooding,” enables companies to test new products and features within their own environments to better understand customer implementation and experiences. Often, it can be a literal gut-check, bringing to light problems with the software products you are building. But it can also help companies solve their own internal challenges and even surface ideas for new capabilities.

At Sentry, dogfooding our products has provided immense value. For example, earlier this year we used Sentry for Performance when implementing our new feature flag software, and we were able to find and fix a serious issue in how we were fetching our flags that was causing a 500ms delay. Dogfooding our performance tool has also helped us resolve several performance issues in our UI and improve our products’ efficiency.

Software companies of all types and sizes have the potential to reap the rewards of dogfooding. Below are just a few examples of how Sentry benefited from this approach.

SEE ALSO: “A large, worldwide fan base that wants to help others and defends PHP”

Improving Performance of Our Issue Details

One of Sentry’s most trafficked pages is the issue details page, which helps developers understand an error’s root cause. A large amount of data from these events—counts, charts, and other metadata—need to be fetched to render these issue details.

We faced a problem with two components to this page: latest event and summary statistics. The page loaded the latest event only after loading the summary statistics, creating response times of more than a second.

To figure out why this was happening, we looked into the issue page’s React component tree and noticed the latest event component was a child of the summary component. Although we were fetching the data for the latest event in its component, because this component could only begin rendering after its parent has fully rendered (which, in turn, is blocked by the parent request), the two requests were not parallelized.

To fix this, we loaded data from both endpoints higher in the component tree, allowing us to parallelize both requests. Then we passed the result down so each component can render as soon as its data became available. After the fix, the issue page’s P75 response time dropped by about 1 second:

You can’t fix what you can’t see. And because Sentry for Performance gave us the visibility into the two requests that needed to be parallelized, we found the quickest path to a solution—and a faster page for our customers.

Solving Django N+1 Query Problems

The Django Python framework allows developers to quickly build websites. One of its best features is the object-relational mapper (ORM), which allows you to make queries to the database without writing any SQL. Django will allow you to write queries in Python, and then it will try to turn those statements into efficient SQL. The ORM often creates the SQL flawlessly, but sometimes the results are less than ideal.

One common database problem is that ORMs can cause N+1 queries. These queries include a single, initial query (the +1), and each row in the results from that query spawns another query (the N). These often happen when you have a parent-child relationship. You select all of the parent objects you want, and then when looping through them, another query is generated for each child. This problem can be hard to detect at first, as your website could be performing fine. But as the number of parent objects grows, the number of queries also increases to the point of overwhelming your database and taking down your application.

Recently, I was building a simple website that kept track of expenses and expense reports. I wanted to use Sentry for Performance to assess how the application was performing in production. I quickly set it up using the instructions for Django and saw immediate results.

The first thing I noticed was that the median root transaction was taking 3.41 seconds. All this page did was display a list of reports and the sum of all of a report’s expenses. Django is fast, and it shouldn’t take 3.41 seconds.

Looking at the code, I couldn’t see any immediate problems, so I decided to dig into the Event Detail page for a recent transaction. I noticed how many queries the ORM had generated. I had a single query to fetch all of the reports and then another query for each report to fetch all expenses—an N+1 problem.

Django evaluates queries lazily by default. This means that Django won’t run the query until the Python code is evaluated. In this case, when the page initially loads, Reports.objects.all() is called. When I call {{ report.get_expense_total }} Django runs the second query to fetch all of the expenses. There are two ways to fix this, depending on how your models are set up: [select_related()] and [prefetch_related()].

select_related() works by following one-to-many relationships and adding them to the SQL query as a JOIN. prefetch_related() works similarly, but instead of doing a SQL join, it does a separate query for each object and then joins them in Python. This allows you to prefetch many-to-many and many-to-one relationships.

We can update ReportsList to use prefetch_related(). This cuts the number of database queries in half since we’re now making one query to fetch all of the Reports, one query to fetch all of the expenses, and then n queries in report.get_expense_total.

To fix the n queries from report.get_expense_total, we can use Django’s annotate to pull in that information before passing it to the template.

Now the median transaction time is down to 290ms. If the number of queries increases to the point where it could take your application down, try using prefetch_related or select_related. This way, Django will fetch all of the additional information without the extra queries. After doing this, I saved 950 queries on the main page and decreased the page load by 91%.

SEE ALSO: Getting past the ‘muck’ to make a success of secrets management

Making our UI More Efficient

Lazy-loading is an ironic term in programming because it makes User Interfaces more efficient. And efficient UI is important at Sentry. We don’t want customers tapping their feet and pointing to their imaginary watch while waiting for their page to load. Unfortunately, rendering the sheer amount of real-time data that Sentry provides—error frequency, release data, latency, throughput—can be a problem.

As developers, we know most problems have already been solved somewhere (by somewhere, we mean Stack Overflow). But we process and render large amounts of data. For example, we fetch everything from sessions, users, version adoption, crash count, and histograms on our releases page. But if we fetched everything at once, there’d be the data equivalent of a traffic jam—and it would take some time before users would be able to see their data.

This is why we split it into two parallel requests. By showing the results immediately after the first request finishes—instead of showing a spinner—users now don’t have to wait for all data to render just to see some of their data.

Even though these data sets are fetched in parallel, we noticed in Sentry for Performance how the gap between those two phases initializing could be substantial. We also found this time gap even more pronounced for release-heavy organizations that push code multiple times per hour.

From a UI perspective, the default choice would be to toss out one of those loading spinners while the lagging data set was still rendering. But we felt that wouldn’t be fair to the data—or customers. We decided to introduce two-phased loading, where we fetch releases and health data separately—while still in parallel. The result: a 22% faster UI, with almost half a second saved in load time. There’s no use in producing error data if we can’t present that data to customers.

Author

Adam McKerlie

Adam McKerlie is a senior engineering manager at Sentry, the leader in application monitoring. For software teams, Sentry is essential for monitoring code health. From error tracking to performance monitoring, developers can see clearer, solve quicker, and learn continuously about their applications—from frontend to backend.


guest
0 Comments
Inline Feedbacks
View all comments