Back to Projects
AI & Automation

AI SOC Analyst

An intelligent, AI-powered Security Operations Center (SOC) analyst tool that leverages OpenAI's API to perform automated threat hunting across Microsoft Defender for Endpoint (MDE), Azure Active Directory, and Azure resource logs.

OpenAI API
Python
KQL
Azure Log Analytics
Microsoft Defender
View on GitHub

Overview

This tool acts as an agentic AI copilot for SOC analysts, intelligently selecting relevant log sources, querying Azure Log Analytics Workspace, and analyzing results to identify potential security threats. It maps findings to MITRE ATT&CK framework and provides actionable recommendations.

Features

  • Intelligent log query selection — OpenAI function calling picks tables and fields from natural language requests.
  • Multi-source threat hunting — MDE (DeviceProcessEvents, DeviceNetworkEvents, DeviceLogonEvents, DeviceFileEvents, DeviceRegistryEvents), Azure AD (SigninLogs, AuditLogs), Azure Activity, NSG flow logs.
  • MITRE ATT&CK mapping — Tactics, techniques, and sub-techniques in outputs.
  • Cost management — Model selection from token usage, rate limits, and cost estimates.
  • Guardrails — Validates tables, fields, and models.
  • Structured output — JSON with confidence, IOCs, and recommendations.
  • Threat logging — Findings appended to _threats.jsonl.

Demo

Walkthrough of the AI SOC Analyst tool in action.

Watch on YouTube

Architecture

Modular layout:

  • _main.pyEntry point; orchestrates the threat hunting workflow.
  • EXECUTOR.pyOpenAI calls, log query execution, and analysis.
  • PROMPT_MANAGEMENT.pySystem prompts, tools, and hunting instructions.
  • MODEL_MANAGEMENT.pyModel selection, tokens, cost, rate limits.
  • GUARDRAILS.pyAllowed tables, fields, and models.
  • UTILITIES.pySanitization, formatting, and visualization helpers.

Prerequisites

  • Python 3.8+
  • Azure subscription with Log Analytics Workspace, Microsoft Defender for Endpoint (for MDE tables), and Azure AD logs where needed
  • OpenAI API key with access to GPT models
  • Azure CLI configured — az login

Configuration

Model selection

Edit MODEL_MANAGEMENT.py:

  • DEFAULT_MODEL — e.g. gpt-4.1-nano
  • CURRENT_TIER — OpenAI tier: free, 1–5
  • WARNING_RATIO — default 0.80 (80%)

Allowed tables and fields

Edit GUARDRAILS.py:

  • ALLOWED_TABLES — permitted tables and fields
  • ALLOWED_MODELS — approved models and specs

Usage

Run:

python _main.py

Example natural-language requests:

  • "Something is messed up in our AAD/Entra ID for the last 2 weeks or so, particularly about user arisa"
  • "Show suspicious PowerShell activity on host WS-123 in the last day"
  • "Any failed sign-ins for alice@contoso.com over the past 6 hours?"
  • "Were NSG rules blocking outbound 4444 from VM web-01 this weekend?"

Workflow

  1. Query selection — AI picks tables, fields, and time ranges.
  2. Validation — Guardrails check tables, fields, and model.
  3. Execution — KQL runs against Log Analytics.
  4. Analysis — Model reviews logs for suspicious activity.
  5. Results — MITRE mapping, IOCs, and recommendations.
  6. Logging — Findings saved to _threats.jsonl.

Supported log sources

Microsoft Defender for Endpoint

  • DeviceProcessEvents — process creation and command-line activity
  • DeviceNetworkEvents — connections and network events
  • DeviceLogonEvents — authentication and logons
  • DeviceFileEvents — file operations
  • DeviceRegistryEvents — registry changes
  • AlertInfo / AlertEvidence — alert metadata and artifacts

Azure Active Directory

  • SigninLogs — sign-ins, auth results, risk
  • AuditLogs — directory and identity changes

Azure resources

  • AzureActivity — control plane (resources, roles)
  • AzureNetworkAnalytics_CL — NSG flows via Traffic Analytics

Security considerations

  • Do not commit _keys.py; keep it in .gitignore
  • Field validation limits what can be queried (reduces exfil risk)
  • Table and model allowlists
  • Prompting to minimize PII in outputs

Project structure

AI_SOC_Analyst/
├── _main.py                 # Main entry point
├── EXECUTOR.py              # API calls and query execution
├── PROMPT_MANAGEMENT.py     # Prompt and tool definitions
├── MODEL_MANAGEMENT.py      # Model selection and cost management
├── GUARDRAILS.py            # Validation rules
├── UTILITIES.py             # Helper functions
├── _keys.py                 # API keys (DO NOT COMMIT)
├── _threats.jsonl           # Logged threat findings
└── README.md

Output format

Structured JSON (excerpt):

{
  "findings": [
    {
      "title": "Brief title describing the suspicious activity",
      "description": "Detailed explanation of why this activity is suspicious",
      "mitre": {
        "tactic": "e.g., Execution",
        "technique": "e.g., T1059",
        "sub_technique": "e.g., T1059.001",
        "id": "e.g., T1059, T1059.001",
        "description": "Description of the MITRE technique/sub-technique"
      },
      "log_lines": ["Relevant log lines"],
      "confidence": "Low | Medium | High",
      "recommendations": ["pivot", "create incident", "monitor", "ignore"],
      "indicators_of_compromise": ["IOCs found in logs"],
      "tags": ["privilege escalation", "persistence", "data exfiltration"],
      "notes": "Optional analyst notes"
    }
  ]
}

Troubleshooting

Rate limit errors

  • Tool warns when approaching limits
  • Consider a model with higher limits or narrow time range / fields

No results

  • Verify workspace ID and permissions
  • Confirm the time range has data and entity names are correct

Authentication errors

  • Run az login
  • Validate OpenAI key and Log Analytics access

License & disclaimer

Provided as-is for security research and SOC operations. This was a personal project from an internship, built for learning—not actively seeking contributions.

For authorized security operations only. Ensure you have permission before querying log data. Authors are not responsible for misuse.