The DevOps guide to evaluating modern log management tools
How can we apply DevOps principles to log management tools? Steve Newman explores a DevOps testing framework for log management tools with a focus on metrics like performance, usability, extensibility, and security.
Enterprise IT, information security, and engineering teams share many priorities, workflows, and even technology. But each is very different from the rest in the context of operational visibility. Engineers, especially those on the front line of a rapid, continuous software delivery process, require a strong focus on performance and alignment with their teams’ work styles.
Legacy operational visibility tools have long been optimized for the needs of IT and information security users. As new architectures have changed software development demands, and the pace of software delivery has hastened, engineers’ needs when it comes to observability have crystallized around a certain set of capabilities, including speed, usability, integrations, and extensibility. Here is a testing framework for log management tools that goes beyond traditional IT requirements and caters to modern DevOps needs.
Performance measurements have two key scaling criteria: volume of inbound messages and amount of data in the system. Although every environment is unique, there are core requirements that qualify a tool as high-performance. Here are the three to consider:
Ingest latency. Logs get less useful as they age, especially when troubleshooting live issues. Compare how quickly log messages become available in the log management console. They should be available within seconds of being generated.
Alert latency. This often relates to ingest latency. In order to be trusted and valuable, notifications and other actions must be delivered promptly. Compare alerts at regular intervals (like one minute) to ensure that they were triggered appropriately and fired within your service level expectations.
Query tests. Here are three key query performance tests, covering a range of query styles that are useful for developers who troubleshoot or debug software issues. Each search should return a correct response in less than one second (these are chosen from common data from my company, Scalyr, but you may want to substitute elements from your own log data or code).
- Search your logs for all access log records where the HTTP status code was a 503 error. This field-based query is the easiest type of search for any log management system, and all adequate solutions should meet the 1 second response time requirement.
- Search your logs for all database queries that ran a “select” command. This is a substring search, which for some solutions is less efficient than the field-based query of the previous test.
- Search your logs for any message that contains a portion of a SQL query, in our example case “FROM Friends WHERE a=\’?\’”. This is an example of a more complicated, but still very common query, and can be more difficult for some solutions to execute efficiently.
The DevOps process is all about making engineers more efficient and collaborative, and their log management tools must support this. Here are five usability factors to evaluate a tool’s ability to empower your team to move quickly:
Ease of end-user adoption. Adoption is critical for effective rollout and long-term benefit. Test this by asking a few team members of varying seniority to log into the tool. They should feel comfortable and eager to continue using the tool without any training.
Support for multiple users. Access for the entire team, without having to pay more or sacrifice performance, is key for teams working quickly and in parallel. Invite a wide variety of users to access the tool, query data, and set up dashboards and alerts. The tool should enable entire teams (across departments) to work simultaneously and get value with zero performance penalty.
Ability to troubleshoot quickly. Engineering users should be able to troubleshoot and find issues quickly and with no prior training on the tool.
Dashboard creation. Dashboards allow for rapid visualizations of data trends. Ask a user to create and share a dashboard consisting of key HTTP/web server metrics and events. An engineer should be able to create and share dashboards within a few minutes.
Distributed transaction tracking. A major benefit of logs and metrics aggregation is being able to view loosely-coupled systems at the same time. Using test data, ask a user to track a single user request across multiple system components using a unique transaction ID. Engineers should be able to see a transaction transit the web, application, and database layers in less than one second, using a single query.
3. Integrations and extensibility
The best log management tools play nicely with external applications and systems to provide a holistic view of your environment. After integrating a log management tool with your existing data visualization dashboards and notification and alerting tools (for example, PagerDuty), make sure information flows cleanly, accurately, and without latency or data drops. Your log management tool should also allow users to apply custom tags to log messages.
4. Data privacy and other strategic requirements
Redacting sensitive data to prevent inadvertent data exfiltration is key for maintaining security and compliance standards. You can test this by configuring data redaction at the agent to prevent sensitive data for a known pattern (for example, a social security number) from reaching the log management tool. Administrators should be able to filter arbitrary strings from log messages before they are transmitted off of the source system.
As privacy and security become an imperative across all business functions, DevOps included, it’s wise to verify compliance standards before purchasing any new technology. Check that the tool complies with relevant laws and standards such as GDPR, HIPAA, PCI-DSS, and SOC 2.