Research & Technical Blog

Deep dives into network analysis and automation

AQM Research Architecture Diagram

Research on Active Queue Management (AQM) Classification and Analysis

Click to Expand & Read

During my research internship at Institut Polytechnique de Paris and later at Telecom Sud-Paris, I developed a full testbed and classification framework capable of identifying which Active Queue Management (AQM) algorithm is running in a bottleneck router. The system was designed to distinguish between algorithms such as CoDel, FQ-CoDel, Cake, PIE, RED, FIFO, and FQ. The ultimate goal was to analyse how these queue disciplines affect congestion, latency, and fairness, and to build a detection system that operates without privileged access to the router.

Objective

The main objective was to identify the AQM algorithm active in a network bottleneck using only RTT traces collected from UDP traffic. This required creating an experimental framework that generated, captured, analysed, and compared large quantities of network data automatically. The classification needed to remain accurate across varying link speeds, queue lengths, and packet sizes.

Architecture Overview

The system was structured around three main machines forming a controlled testbed:

  • Client: Responsible for generating UDP traffic and recording RTT measurements in real time.
  • Router: Configured with different AQM algorithms via Linux's tc qdisc system (CoDel, FQ-CoDel, PIE, RED, Cake, FIFO, and FQ). This device acted as the network bottleneck, enforcing queuing delay and packet drops.
  • Server: Received packets and sent acknowledgements back to the client, closing the feedback loop.

Each component interacted through a Python-based orchestration layer that coordinated configuration, data collection, and experiment timing. The design allowed repeatable testing under controlled parameters such as bitrate, queue size, and packet interval.

Workflow

  1. Connection Establishment — The master client connects to the master server over TCP.
  2. Bottleneck Measurement (1) — The server runs the bitrate measurement server binary. The client runs the corresponding measurement client binary and extracts the bottleneck bitrate.
  3. Bottleneck Type Verification (2) — The server runs the bottleneck type checker binary. The client runs the type checker client binary at the measured bitrate + 2 Mbit/s. If throughput increases significantly, the bottleneck may be CPU or another factor; otherwise it is confirmed as interface speed.
  4. Queue Size Measurement (3) — The server runs the queue measurement server binary. The client extracts the queue length in packets.
  5. Queue Consistency Check — The client estimates queuing delay based on bitrate, packet size, and measured queue length. If the estimated sojourn time is very small, a warning is raised but the experiment continues.
  6. Simulation-Based Test Time Estimation (4) — The client runs a CoDel simulation using measured parameters. The simulation suggests an experiment duration; if invalid, 30 seconds is used as default. Final test time = simulation time + 1 second.
  7. AQM/FQ Measurement (5) — The server starts the AQM/FQ measurement binary. The client launches the classification measurement tool, tuned with adjusted bitrate, packet size, and estimated test time. RTT traces are collected.
  8. File Transfer — The client sends all generated CSV files to the server.
  9. Classification (6) — The server runs the Python classifier with the correct template family. The classification result is sent back to the client and logged.
  10. Completion — Both sides close the connection.
AQM Research Architecture Diagram

Figure 1 – Overall architecture of the AQM classification and analysis framework.

Detailed Architecture Components

1. Traffic Generation and RTT Measurement

I implemented multiple C programs that used both raw UDP sockets and standard UDP sockets to simulate various traffic flows with high timing accuracy. Instead of relying on usleep() or fixed sleep intervals, the sender dynamically tracked elapsed time since the start of transmission and computed how many packets should have been sent by that point. If the actual number of sent packets lagged behind the theoretical rate, the sender immediately sent the necessary packets to catch up.

This timing control approach ensured consistent and precise bitrate regulation without cumulative drift caused by system-level sleep inaccuracies. Each sender operated based on a defined target bitrate, allowing accurate reproduction of network load at different transmission speeds.

  • Bitrate-based pacing by calculating expected packet count from elapsed time since start.
  • Configurable payload size and total packet count per experiment.
  • Support for concurrent sender threads to simulate multiple independent flows.

Round-Trip Times (RTTs) were measured at the client side by timestamping outgoing packets and matching them with acknowledgement timestamps received from the server. The resulting RTT traces were stored as time-series datasets, capturing queue buildup, delay oscillations, and congestion patterns under each tested AQM discipline.

AQM Signature RTT traces

2. Router Configuration and AQM Control

The router (a Linux system acting as a virtual bottleneck) was configured using tc qdisc commands. A shell-based configuration script applied the desired AQM discipline on the egress interface:

root@router:~# tc qdisc add dev eth0 root codel limit 1000 target 5ms interval 100ms
root@router:~# tc qdisc add dev eth0 root pie limit 1000 target 20ms tupdate 15ms
root@router:~# tc qdisc add dev eth0 root fq_codel
# etc...

Additional scripts removed qdiscs cleanly after each experiment.

3. Bottleneck Analysis and Pre-Processing

  • Bottleneck Bitrate Estimation: A C program transmitted UDP packets and the server approximated the link's maximum sustainable rate by looking at the maximum received rate.
  • Queue Length Estimation: Another C program measured queue buildup time to calculate queue depth.
  • CPU Bottleneck Detection: Another C program ensured the observed delay was due to link capacity, not processor limitations.

4. CoDel Simulation

I implemented a Python simulation of the CoDel algorithm faithfully based on the Linux kernel's CoDel implementation. This simulation studied how network parameters such as bottleneck bitrate, packet size, queue length, and added bitrate influence delay control behaviour. It also provided a reference RTT pattern used to calculate the needed total test time and for future classification comparisons.

5. Data Collection and Storage

Each experiment produced multiple CSV trace files:

  • Per-packet RTT measurements (timestamp, RTT value, sequence number).
  • Metadata files containing AQM type, bitrate, queue length, and parameters.

All data was organised hierarchically by AQM type and bitrate for structured post-processing.

6. Classification and Analysis Engine

The classifier was implemented in Python using Dynamic Time Warping (DTW) to measure similarity between new RTT traces and pre-recorded template traces of each AQM type. Two classification stages were applied:

  • Stage 1: Distinguish between fair-queuing AQMs (FQ-CoDel, FQ, Cake) and non-fair-queuing ones (CoDel, PIE, RED, FIFO).
  • Stage 2: Perform fine-grained classification within each group using normalised DTW distance metrics and flow correlation behaviour.

Automation Framework

A central Python orchestration script handled:

  • SSH-based remote control of client, router, and server machines.
  • Automated setup and teardown of AQM configurations.
  • Timed execution of traffic experiments and collection of logs.
  • Computation of DTW similarity matrices and accuracy statistics.

Results

Legend: Blue = Ideal Testbench  |  Orange = Ideal Testbench with remote server  |  Green = Home network tests

AQM Research Results

The developed framework achieved high classification accuracy under a wide range of network conditions, including variations in bitrate, queue length, and packet size. In controlled testbed environments, the system consistently distinguished between all tested AQM algorithms with strong separation in their RTT profiles. When evaluated in a real-world scenario involving a live client and router over an operational network, the classifier maintained an overall accuracy of 87%, demonstrating its robustness outside laboratory conditions.

Each AQM exhibited a distinct temporal signature in the RTT traces:

  • CoDel and PIE displayed highly recognisable periodic delay oscillations corresponding to their active queue control cycles, making them the most easily distinguishable algorithms.
  • RED showed probabilistic yet smoother delay variations resulting from its random early drop mechanism, producing moderate oscillations without strict periodicity.
  • FQ-CoDel, Cake, and FQ each displayed their own characteristic RTT signatures in single-flow conditions. All three demonstrated clear evidence of per-flow fairness and reduced inter-flow interference, with RTTs remaining stable and well-isolated between flows.
  • FIFO traces were dominated by congestion plateaus and sharp delay spikes, reflecting unregulated queue buildup and absence of any active queue control.

These results confirm that end-to-end RTT measurements carry sufficient information to infer the active queue management policy at a bottleneck router. The classification performance validates the effectiveness of combining precise C-level traffic generation, kernel-level AQM configuration, and DTW-based pattern matching for automated queue behaviour identification.

Conclusion

Real-life AQM classification example

This project represents a state-of-the-art framework for the identification and analysis of Active Queue Management (AQM) algorithms using only end-to-end network measurements. By integrating C-based traffic generation, precise RTT collection, kernel-level AQM configuration, and Dynamic Time Warping (DTW)-based classification, it achieves an unprecedented level of accuracy and automation in recognising queue management behaviour without requiring any access to routers.

The system goes beyond traditional traffic analysis approaches by correlating time-domain RTT dynamics with underlying queue control logic, allowing fine-grained differentiation between AQMs such as CoDel, PIE, RED, Cake, FQ-CoDel, FQ, and FIFO. Its modular and fully automated design makes it adaptable to both controlled research environments and real-world network scenarios.

The resulting framework demonstrates that passive AQM classification is both feasible and practical — a significant advancement in network diagnostics and congestion research. It establishes a foundation for future intelligent systems capable of autonomously identifying, simulating, and optimising queue management strategies in next-generation networks.

Beyond classification, the system increases visibility into bottleneck behaviour, enabling network operators to identify the active AQM and adapt congestion control algorithms accordingly for tangible performance gains. This capability also has dual security implications: while identifying the AQM can be leveraged by attackers as a reconnaissance step to craft targeted strategies, it equally empowers defenders. Knowing the active AQM allows adaptive congestion control mechanisms to respond effectively under attack or in hostile conditions, preserving network stability and throughput.

Future extensions include integrating reinforcement learning to predict queue type adaptively and simulating all AQMs to generate on-the-fly perfect templates given network parameters.

Home Server Architecture

Home Server Infrastructure, Automation & Observability Platform

Click to Expand & Read

I designed and deployed a self-hosted Ubuntu server environment operating as a personal cloud, automation engine, and monitoring platform. The system runs continuously and hosts multiple production-grade Python services, supported by a full observability stack and secured through a Zero Trust architecture.

Full Architecture Diagram

Infrastructure Architecture

  • Ubuntu Server (VM-based): Centralised host running all automation and monitoring services.
  • Systemd-managed services: All Python applications run as managed services with automatic restart policies and failure handling.
  • Isolated Python virtual environments: Dependency separation for reliability and maintainability.
  • Structured logging: JSON-based logs for automation tasks and service health tracking.

Fully Automated YouTube Content Pipeline

I developed a complete end-to-end automated YouTube publishing system that generates content, converts it to media, assembles the final video, and uploads it — without manual intervention.

1. Content Generation Layer

  • Integration with the Google Gemini API for automated script generation.
  • Automatic fallback to an older Gemini model when API rate limits or quota limits are reached.
  • Structured JSON parsing to extract titles, descriptions, and script bodies reliably.

2. Media Processing Pipeline

  • Text-to-Speech generation using API-based voice synthesis.
  • Automated audio normalisation and processing.
  • Video rendering using programmatic video assembly.
  • Dynamic audio overlay and synchronisation between narration and visuals.
  • FFmpeg-based encoding and final export pipeline.

3. YouTube Integration

  • Secure OAuth-based integration with the YouTube Data API.
  • Automated upload with metadata injection (title, description, tags, visibility).
  • Upload status monitoring and structured success/failure logging.

The entire pipeline is orchestrated through Python, with robust exception handling, retry mechanisms, and logging to ensure resilience against API limits or transient failures.

Custom Media Ingestion Service (Python FTP Server)

To support content workflows, I implemented a lightweight Python-based FTP server that enables secure transfer of media files from my mobile device directly to the server.

  • Dedicated service for controlled upload of images and videos.
  • Directory-based separation for automated processing pipelines.
  • Runs as a managed background service with restart policies.
  • Monitored via Prometheus exporter for service health and activity metrics.

Observability & Monitoring Stack

The server is instrumented with full-stack monitoring to ensure continuous visibility into system performance and application health.

1. Infrastructure Monitoring

  • Prometheus collects system-level metrics (CPU, RAM, disk I/O, network).
  • Grafana dashboards provide real-time visualisation and historical trend analysis.
Grafana dashboard 2

2. Custom Application Metrics

  • Developed a custom Python Prometheus exporter.
  • Exposes service state (running, failed, restarting).
  • Exports task-level metrics such as upload success rate, processing duration, and error counters.
  • Tracks FTP server activity and connection statistics.
Grafana dashboard

3. Uptime Monitoring

  • Deployed Uptime Kuma to monitor:
    • Ubuntu server availability
    • peterfarah.com
    • farmavetservices.com
  • Alerting mechanisms for downtime detection.
Uptime Kuma dashboard

Zero Trust Remote Access Architecture

Instead of exposing services via port forwarding, I implemented Cloudflare Zero Trust Tunnel to securely expose internal services while keeping the server protected behind NAT.

Cloudflare Zero Trust applications
  • Secure SSH access via ssh.peterfarah.com.
  • Grafana dashboard accessible at grafana.peterfarah.com.
  • Uptime dashboard accessible at up.peterfarah.com.
  • Request Limited access: access.peterfarah.com.
  • No inbound ports opened on the router.
  • Access control policies enforced at the Cloudflare layer.

Engineering Principles Applied

  • Automation-first system design
  • Resilience through fallback mechanisms and service restart policies
  • Structured observability with custom instrumentation
  • Secure-by-design remote access using Zero Trust architecture
  • Production-style service orchestration in a self-hosted environment

This project demonstrates practical DevOps engineering, API integration, automation pipelines, secure infrastructure exposure, and full observability — all operating continuously in a real-world self-hosted production environment.

Synology DS925+ Architecture

Enterprise-Grade Home Storage: Synology DS925+

Click to Expand & Read

To support growing data requirements and ensure reliable backups for the home infrastructure, a dedicated Network Attached Storage (NAS) solution was implemented. The architecture centers around a Synology DS925+, populated with dual 18TB Seagate Exos enterprise-grade hard drives.

Storage Configuration & File System

The drives are configured using Synology Hybrid RAID (SHR1), which offers superior flexibility, expansion capabilities, and storage optimization compared to traditional RAID 5 arrays. To guarantee data integrity and protect against bit rot, the volumes leverage the Btrfs file system, enabling proactive data scrubbing and advanced snapshot capabilities.

Snapshot Retention & Storage Efficiency

A core advantage of this Btrfs setup is the implementation of robust snapshot retention policies. The system takes automated daily snapshots that are retained for 90 days, alongside monthly snapshots kept for a full year. Because the NAS is primarily utilized for mobile photo backups, file edits and deletions are highly infrequent. As a result, these extensive point-in-time snapshots consume very little additional storage space, while still providing an immediate rollback mechanism against accidental deletion or ransomware attacks.

Automated Workflows & Cold Storage

The NAS acts as a centralized, self-hosted hub for multiple critical backup streams. Utilizing Synology Photos, automated workflows sync mobile media seamlessly over the network, keeping it entirely within a Zero Trust environment. Additionally, the NAS continuously backs up critical files from the primary PC. For ultimate disaster recovery, any data living exclusively on the NAS is manually backed up to an offline cold storage drive every few months, ensuring an air-gapped fallback is always available.

Network Integration & Transfer Speeds

To eliminate local network bottlenecks during large file transfers and media syncing, a dedicated 5GHz-capable OpenWrt router was integrated into the local architecture. This ensures high-speed, low-latency wireless data transfer across the local Wi-Fi, allowing the NAS to operate at peak efficiency without being constrained by standard ISP equipment limitations.

Containerized Applications & Observability

Beyond simple storage, the NAS serves as a robust application host running Docker containers. The observability stack includes Prometheus and Grafana (operating on host networking) alongside an SNMP exporter to monitor system health, storage metrics, and network performance. Additionally, the NAS hosts a custom-built Python application (Treasury Vault) for financial tracking and bookkeeping.

Docker Projects

Centralized Zero Trust Architecture

Instead of exposing services via port forwarding, a centralized Cloudflare Zero Trust Tunnel is deployed as a standalone Docker container. Rather than running a separate tunnel for each individual application, this single global gateway handles routing to multiple services (including Grafana, and Treasury Vault) by mapping Cloudflare Public Hostnames directly to the NAS's local IP and corresponding container ports. This architecture drastically simplifies the deployment of new services while maintaining strict access control and isolating the internal network.

Cloudflare Zero Trust applications
  • Grafana dashboard accessible at nas-grafana.peterfarah.com.
  • No inbound ports opened on the router.
  • Access control policies enforced at the Cloudflare layer.