Until now, the big promise of AIOps has come up short on delivering meaningful AI analysis. While companies have branded tools as AIOps, they are primarily simpler math models to detect metrics that deviate from the norm.
Selector has broken this trend and changed the minds of many with its true AIOps networking solution, as revealed in episode 707 of Packet Pushers’ Heavy Networking podcast. This weekly show features network engineers, industry experts, and vendors who share practical information about what works and doesn’t work in networking.
In episode 707, Selector Co-founder and CTO Nitin Kumar and Product Manager Kevin Kamel joined hosts Ethan Banks and Andrew Conry-Murray to discuss Selector’s AIOps approach to network observability and explain how it makes network observability easier for network engineers. If you haven’t listened to the episode yet, keep reading to get the conversation highlights.
Transformation, from operational data to insights
To start, Kamel describes Selector as “a platform to transform your operational data—telemetry—into actionable insights.” He says it establishes a sense of what’s normal for your environment and then points you toward the issues to help you quickly address them.
Selector surpasses traditional anomaly detection that simply issues multiple alerts and leaves it up to network engineers to sort through the data, logs, and metrics to determine the problem. Kamel explains Selector collects the data and uses sophisticated machine learning (ML) to analyze that telemetry. From that analysis, anomalies and abnormal conditions surface that are occurring within the telemetry. Then, Selector creates context about what might be happening more broadly within the environment.
Kumar adds, “Context is important because the information we provide has to be actionable. It needs to tell you what likely is affected and what to do next. … A lot of analysis [goes on behind] the scenes for Selector to make these predictions.”
Augmentation not replacement
Kamel goes on to explain that Selector doesn’t replace your existing tools but augments them by placing Selector on top of your current ecosystem. “We’re really looking to collect all of that data from the existing tools, do our magic, and then surface the insights from your telemetry. This allows us to show quickly … what the platform can do without an extended POC (proof of concept). The idea of ripping and replacing these solutions is just a political type of activity anyway, which we want to avoid.”
Kamel says, “I see us as augmenting the entire monitoring and observability ecosystem. It’s not just that we’re augmenting one type of telemetry or offering one type of insight. It’s that we’re going to collect this data from everything you have—the more data, the better from our perspective. Then we’re going to add a whole new layer of insight and value to it, effectively breathing new life into the legacy monitoring ecosystem that you tend to find within large organizations.”
True full-stack observability
Digging into Selector’s approach, Kamel highlights the generalized capability of Selector’s platform. “In the market now, you see quite a few full-stack observability offerings. … when you look under the covers, these platforms don’t look at the network. They call themselves full stack, but that’s a misnomer [because] they’re ignoring the delivery of their application to the end user.”
He explains that they’ll either ignore the network entirely or provide a token functionality where a product manager checks a box to mark that they’re collecting basic network information. Kamel says, “It’s not really full stack to me. I started using the term true full-stack observability to represent this capability that Selector can collect the telemetry from the network, all the way up to the application and everything in between. … that’s the novel capability we have.”
Selector goes beyond application performance management (APM). It can understand the relations and dependencies to correlate the inferences and connections without having an engineer build their own solution from the ground up.
Collection, correlation, and collaboration
Under the covers, Selector’s data-driven platform collects the data, correlates and computes metrics and logs into events, and enables collaboration. Kumar breaks down the process.
1. Collect the data
Collection requires a lot of data engineering and moving large bits of data from place to place. This data includes:
- Simple Network Management Protocol (SNMP) devices, such as log stores
- Metadata, such as from a configuration management database (CMBD)
- Workflow tools, such as ServiceNow and PagerDuty
Selector solves the data collection challenges in three ways:
- First, Selector has a robust collection mechanism that uses Kubernetes to scale out your compute and storage resources elastically.
- Second, it collects data across a hybrid environment, pulling together disaggregated data from various repositories either on-premises or in the cloud.
- Third, it takes data in any format you have, compiles it, and generates binaries that produce normalized data on the other side.
2. Correlate and compute metrics and logs
Correlation refers to how you take in your data stream and produce a knowledge graph. The data streams fit into categories, such as numbers, metrics, and logs. Selector uses ML to do the correlations, which converts the metrics, establishes baselines, and transforms them into events. Selector does this same process for logs, by converting sentences into events using named entity recognition.
After Selector creates a common currency of events with the metrics and logs, it correlates them, using inference to group the events. Selector then uses correlation algorithms to build a knowledge graph, which produces the insights network engineers need.
Kumar says, “Correlations are always happening in the background. It looks at a sweep of 5–10 minutes in the past, finds a correlation, and keeps moving. It’s doing a continuous sweep of events.” He adds the interval is tunable, but you don’t want it too short, because then you must deal with false positives. “Usually, correlations are discovered and shared in 1–5 minutes.”
3. Collaborate over insights
Kumar emphasizes that the collaboration layer is the most important in network observability solutions like Selector. He says, “It has to be done in a simple way. That’s where our integration with Slack and Microsoft Teams is important because we share the insights in plain English.”
By using plain English, engineers don’t have to learn a new language or go through JSON or cryptic labels to understand what they need to do. Kumar adds, “We use natural language generation frameworks to take internal JSON data and turn it into English output. … We call it a democratization of your observability. It’s no longer a responsibility of the experts; anybody can have access to the insights the system produces.”
Looking ahead
As the conversation concludes, Kamel highlights key developments the Selector team is pursuing:
- Use case to use Selector to build a digital twin of a network: “[Our customers] want to predict what will happen to workflows when the network goes down using the knowledge graph technology in Selector.”
Some clients have used Selector in building a digital twin of their network so they can predict which telemetry won’t get routed properly when the network goes down. Selector plans to fine-tune this process for more clients to apply this approach.
- Selector Copilot: “A game changer in terms of truly democratizing access to your operational data. … The idea is to level up the conversational interaction model that we offer today.”
Selector Copilot uses conversational AI to help triage network issues. For example, users can ask questions like, What’s wrong? What’s the severity of the impact? Why is this happening? Do we have a way to fix this? Selector Copilot can then respond to say when the incident occurred, whether it was fixed, how to fix it, and so on.
- True full-stack observability to take telemetry from the network through the application: “You can tie the user experience down through the entire stack and sort of pinpoint exactly where challenges or friction with the user experience is coming from.”
Selector plans to further develop the platform to deliver on that true full-stack observability concept. The company wants to take telemetry from the network up through the application using AI and ML, which is an approach no other vendor in the market is doing today.
Catch the complete Episode 707: Getting Real With Selector’s AIOps of the Heavy Networking podcast on the Packet Pushers website. Or stream it on your favorite podcast platform, including Apple Podcasts and Spotify.