A self-healing network looks set to be one of the most challenging forms of autonomy. CSPs are already beginning to pilot agentic architecture in assurance but face a range of challenges.
Creating self-healing in the network looks set to be one of the most challenging areas of autonomy in a CSP. Substantial work will be needed to improve the availability of suitable orchestration and to develop stronger supporting intelligence, data, and knowledge. Recent research into CSP activity in assurance uncovered a range of challenges, each having a resolution that will take time to execute:
1. Tackling multi-domain, multi-vendor network environments
Data siloes sitting within individual assurance solutions – some of which date back more than two decades – offer a continuing bump in the road towards autonomy, particularly in service assurance.
The federation of this data into a mix of data lakes and streaming systems is underway, but far from mature. This federation supports certain slower automations, such as trend-based anomaly detection, but does not create a solid pathway to TM Forum Autonomous Networks Level 4.
In a recent survey for Radcom, Heavy Reading found that of the 84 CSPs interviewed, over 74% plan to deploy agents across multiple network operations processes within two years, with service assurance as their top priority. These agentic systems will eventually enable faster automations and domain-specific data agents that ingest and interpret local data and respond to queries from other agents will be a key element.
2. Data quality, integrity, and availability
The network generates billions of data points that must be collected and managed to provide high-quality input for assurance models.
There is an increasing consensus that 5GA and 6G will generate data volumes that become prohibitively expensive to collect and that, despite the large data volumes, assurance models may also suffer from insufficient data. For example, a modern, stable network does not generate enough errors for a root-cause analysis model to reason over.
A range of solutions will be needed:
- data agents, and sub-agents/tools within their hierarchy, will improve data quality, integrity, and availability by monitoring, correcting, and enforcing rules
- where data volumes are very high, CSPs will need to sample selectively and use ML-based inference to estimate or interpolate missing data
- a data agent may also be able to make decisions over the value versus cost of collecting additional data for a particular task
- where data sets are not fully available for a model, inference could be used to estimate the potential state/issue, using historic or synthetic data as a proxy.
3. Developing capable and accurate models
CSPs have already invested time and money in building the first wave of AI/ML models, including efforts to improve data quality and develop new capabilities such as model fine-tuning and RAG.
Delivering the next set of models that move them towards assurance autonomy will continue to pose a range of problems, including building accurate models, detecting and understanding issues in high-volume data, and predicting more unusual failures. Solutions here will mostly require continued CSP focus and effort to develop clean data, a suitable knowledge layer, and closed-loop intelligence.
4. Building organizational acceptance
The development of more capable models and quality data/knowledge will be an important part of building trust and acceptance across the organisation, as humans increasingly work with AI and agents.
An individual designated as an “agentic champion,” or several domain champions, will need to take their storytelling skills into teams deploying agents to help employees understand what is possible and identify valuable agentic projects.
The current lack of skill sets at the intersection of AI, agents, and networking will benefit from this work, as network teams are educated and build new skills. However, CSPs will also need to augment these teams with external consultants with highly specialized expertise in solving problems within complex agentic systems.
Five early days goals for a CSP looking to overcome the challenges of building assurance autonomy:
- the creation of a long-term plan for collecting and improving real-time data: across all vendors and technologies, filtering out low-value/unwanted data, and injecting context at the point of collection
- the delivery of small projects with simple, hierarchical agents that allow teams to build new competencies
- agent decisioning should be narrowly defined, limiting the complexity of decisions and the opportunity for error
- the selection of tasks that are already partly automated will provide clean data and processes, as a strong starting point for agentic deployment
- development of a plan to deploy suitable knowledge solutions for ML and AI to reason over
Learn More
AI and Agents in Next-Generation Assurance examines current and future requirements for AI/ML in the assurance market to support the journey toward a self-healing network. Learn how CSPs should move forward with the development of supporting agentic systems and gain insights from STL Partners and analyst Charlotte Patrick on what’s changing, what’s working, and where operators should focus next.