π AIOps Engineer | Python, Systems & Networking | Building AI for logs, metrics & incident detection
I focus on building systems at the intersection of:
- βοΈ Infrastructure & Distributed Systems
- π Networking & Communication
- π Observability (Logs, Metrics)
- π€ AI/ML for Operational Intelligence
- Designing a production-style AIOps system
- Building pipelines for log & metric ingestion
- Developing anomaly detection models for infrastructure
- Applying ML to incident detection and root cause analysis
Core Systems
- Python
- Linux
- Networking & Data Communication
Backend & APIs
- FastAPI (building APIs for AIOps systems)
- PostgreSQL (storing logs, metrics & incident data)
Data & AIOps
- NumPy, Pandas
- Scikit-learn (in progress)
- Time-series & anomaly detection
Engineering
- Git & GitHub
- Jupyter Lab
A lightweight observability simulator that generates system metrics (CPU, Memory, Latency), detects anomalies using statistical techniques, and correlates incidents.
A structured engineering-focused repository covering:
- Log parsing & event correlation
- Infrastructure metrics analysis
- Incident data processing
- Foundations for ML in AIOps
- AIOps Engineer/Builder
- Site Reliability Engineer (SRE)
- Platform / Systems Engineer
- Systems > Tools
- Observability is critical for reliability
- Debugging production systems is a core skill
- AI should enhance operational decision-making
Building an end-to-end AIOps system:
Logs + Metrics + ML β Actionable Operational Insights
- πΌ LinkedIn: (add your link)
π Systems β Observability β AI β AIOps Engineering