Network Automation: From Scripts to Infrastructure as Code

2nd. gen black Amazon Echo speaker on white panel

7 min read

Imperative scripts encode assumptions about device state that inevitably break at scale, while declarative automation defines desired end states and computes minimal changes to reach them.

Neither approach is universally superior—imperative fits operational tasks, declarative fits configuration management—and mixing them without clear boundaries introduces risk.

A network source of truth platform like NetBox or Nautobot must be the authoritative upstream definition of intended state, updated before changes are pushed to devices.

Pre-deployment validation using tools like Batfish can simulate changes against network models, catching reachability failures before they reach production.

Safe automation requires continuous drift detection, atomic configuration replace for rollback, and the discipline to never push a change you cannot reliably undo.

Every network engineer has a folder of scripts. Maybe it's a collection of Expect scripts that push VLAN configs to switches, or a Python wrapper around SSH that updates ACLs across a dozen firewalls. These scripts work—until they don't. A timeout here, an unexpected prompt there, and suddenly you're debugging at 2 AM with half your devices in an inconsistent state.

The evolution from ad-hoc scripting to infrastructure as code isn't just a tooling upgrade. It's a fundamental shift in how we think about network state. Instead of describing steps to take, we describe where we want to end up. Instead of trusting that last Tuesday's change was applied correctly, we continuously verify that reality matches intent.

This shift mirrors what happened in server management a decade ago, but networks present unique challenges. You can't just reimage a core router the way you rebuild a VM. Understanding the progression from imperative scripts to declarative automation—and the infrastructure required to support it—is essential for anyone managing networks at scale.

Imperative vs Declarative: Two Mental Models for Network Change

Imperative automation tells a device what to do, step by step. Log in, enter configuration mode, add this route, remove that ACL entry, save the config. This maps naturally to how engineers think about CLI interaction, which is why it's the default starting point. Tools like Paramiko, Netmiko, and NAPALM's configuration-merge operations all support this model. The engineer translates intent into ordered commands.

The problem is that imperative scripts encode assumptions about current state. If you write a script to add VLAN 100 to a trunk port, it assumes VLAN 100 doesn't already exist, that the port is actually in trunk mode, and that no intermediate change has altered the configuration since you last looked. When those assumptions break—and on a network of hundreds of devices, they will break—the script either fails noisily or, worse, succeeds in producing an unintended result.

Declarative automation flips the model. Instead of specifying steps, you define the desired end state: this interface should be a trunk carrying VLANs 100, 200, and 300. The automation platform compares that desired state against the device's current configuration and computes the minimal set of changes required. Tools like Ansible's declarative modules, Nornir with template-based configs, and vendor-specific solutions like Cisco NSO operate in this mode.

Neither approach is universally superior. Imperative scripts excel at one-off operational tasks—drain traffic from a link, collect diagnostics during an incident, toggle a maintenance flag. Declarative systems shine for persistent configuration management where you need repeatability and idempotency. The failure mode of imperative automation is state drift—reality slowly diverging from intent. The failure mode of declarative automation is abstraction leakage—the platform making changes you didn't anticipate because your state definition was incomplete. Mature network automation uses both, with clear boundaries for when each applies.

Takeaway
Imperative scripts describe the journey; declarative configs describe the destination. Use imperative for operational tasks and declarative for configuration management—mixing them without discipline is where automation becomes dangerous.

Source of Truth: The Foundation Automation Runs On

Automation without an authoritative source of truth is just faster misconfiguration. Before you can declare a desired state, you need a system that defines that desired state—and it can't be a spreadsheet on someone's desktop. The source of truth is the single authoritative record of what your network should look like: IP assignments, VLAN mappings, BGP peer relationships, interface descriptions, ACL policies. Everything the automation platform needs to generate correct configurations.

The most common pattern today uses a network source of truth (NSoT) platform like NetBox, Nautobot, or a custom CMDB. These systems model your network as structured data—sites, devices, interfaces, circuits, prefixes—and expose that data via APIs. Your automation pipeline pulls from this API, feeds the data into templates (Jinja2 being the lingua franca), and produces device-specific configurations. The device never sees the template; it receives rendered, validated configuration.

Maintaining the source of truth is the hard part. It must be updated before changes are pushed, not after. This inverts the traditional workflow where engineers configure devices and then (maybe) update documentation. In an automation-first model, you change the source of truth, the pipeline generates new configs, and changes propagate outward. The source of truth is upstream; the network is downstream. Git repositories often serve as the version-control layer on top of the NSoT, giving you change history, peer review through pull requests, and audit trails.

The discipline required here is cultural as much as technical. If someone bypasses the source of truth and makes a manual change on a device, the next automation run will either overwrite it or flag a conflict. This is a feature, not a bug—it enforces consistency. But it requires the entire team to commit to the workflow. The most sophisticated automation tooling in the world fails if the data feeding it is stale, incomplete, or maintained by only one person who's about to go on vacation.

Takeaway
Your automation is only as reliable as the data it consumes. The source of truth must be upstream of the network—change the model first, then let automation render reality to match.

Testing and Rollback: Making Automated Changes Safe

The fear that stops many teams from adopting network automation isn't the tooling—it's the blast radius. A bad script pushed to one switch is a bad day. A bad template rendered across every switch in a data center is a career-defining event. This is why testing and rollback aren't optional features of a network automation pipeline; they're the entire point of building one.

Pre-deployment validation operates at multiple layers. Syntax validation catches malformed configurations before they reach a device—tools like Batfish and config-lint parsers verify that generated configs are structurally correct. Semantic validation goes further: Batfish can model your entire network topology, simulate the effect of a proposed change, and tell you whether reachability between critical endpoints would break. This is the network equivalent of a compiler catching bugs before runtime. Some teams also maintain lab environments or use platforms like GNS3, EVE-NG, or Containerlab to stage changes against virtual topologies before touching production.

Drift detection is the complement to pre-deployment testing. Once your source of truth defines the intended state, you need a mechanism that periodically compares intended state against actual device configuration and flags discrepancies. Tools like Oxidized or RANCID collect running configs on a schedule; diffing those against the intended state reveals unauthorized manual changes, failed automation runs, or configuration corruption. This continuous verification closes the loop between what you declared and what actually exists.

Rollback strategies depend on your automation model. Configuration replace operations—supported by NAPALM and many modern platforms—let you atomically swap an entire device configuration, making rollback as simple as pushing the previous known-good config. For more granular changes, maintaining a Git history of generated configs gives you a clear revert target. The critical principle is this: never push a change you can't undo. If your rollback plan is "SSH in and fix it manually," you haven't automated—you've just added a faster way to create problems.

Takeaway
Safe automation means validating before you deploy, detecting drift continuously, and always having a tested rollback path. The measure of a good automation pipeline isn't how fast it pushes changes—it's how confidently you can undo them.

The path from a folder of Expect scripts to a fully declarative automation pipeline isn't a single leap. It's a series of deliberate decisions: choosing declarative over imperative for configuration management, establishing a source of truth that the whole team respects, and building validation and rollback into every stage of the change process.

Each of these layers reinforces the others. A good source of truth makes declarative automation possible. Declarative automation makes drift detection meaningful. Drift detection makes rollback reliable. The system works as an integrated whole.

Start where you are. If you have scripts, add idempotency. If you have templates, build a source of truth. If you have a source of truth, add pre-deployment validation. Every step makes the next change safer than the last—and that's the real goal of network automation.