Overview
This tool acts as an agentic AI copilot for SOC analysts, intelligently selecting relevant log sources, querying Azure Log Analytics Workspace, and analyzing results to identify potential security threats. It maps findings to MITRE ATT&CK framework and provides actionable recommendations.
Features
- Intelligent log query selection — OpenAI function calling picks tables and fields from natural language requests.
- Multi-source threat hunting — MDE (DeviceProcessEvents, DeviceNetworkEvents, DeviceLogonEvents, DeviceFileEvents, DeviceRegistryEvents), Azure AD (SigninLogs, AuditLogs), Azure Activity, NSG flow logs.
- MITRE ATT&CK mapping — Tactics, techniques, and sub-techniques in outputs.
- Cost management — Model selection from token usage, rate limits, and cost estimates.
- Guardrails — Validates tables, fields, and models.
- Structured output — JSON with confidence, IOCs, and recommendations.
- Threat logging — Findings appended to
_threats.jsonl.
Demo
Walkthrough of the AI SOC Analyst tool in action.
Architecture
Modular layout:
_main.py—Entry point; orchestrates the threat hunting workflow.EXECUTOR.py—OpenAI calls, log query execution, and analysis.PROMPT_MANAGEMENT.py—System prompts, tools, and hunting instructions.MODEL_MANAGEMENT.py—Model selection, tokens, cost, rate limits.GUARDRAILS.py—Allowed tables, fields, and models.UTILITIES.py—Sanitization, formatting, and visualization helpers.
Prerequisites
- Python 3.8+
- Azure subscription with Log Analytics Workspace, Microsoft Defender for Endpoint (for MDE tables), and Azure AD logs where needed
- OpenAI API key with access to GPT models
- Azure CLI configured —
az login
Configuration
Model selection
Edit MODEL_MANAGEMENT.py:
DEFAULT_MODEL— e.g.gpt-4.1-nanoCURRENT_TIER— OpenAI tier: free, 1–5WARNING_RATIO— default 0.80 (80%)
Allowed tables and fields
Edit GUARDRAILS.py:
ALLOWED_TABLES— permitted tables and fieldsALLOWED_MODELS— approved models and specs
Usage
Run:
python _main.pyExample natural-language requests:
- "Something is messed up in our AAD/Entra ID for the last 2 weeks or so, particularly about user arisa"
- "Show suspicious PowerShell activity on host WS-123 in the last day"
- "Any failed sign-ins for alice@contoso.com over the past 6 hours?"
- "Were NSG rules blocking outbound 4444 from VM web-01 this weekend?"
Workflow
- Query selection — AI picks tables, fields, and time ranges.
- Validation — Guardrails check tables, fields, and model.
- Execution — KQL runs against Log Analytics.
- Analysis — Model reviews logs for suspicious activity.
- Results — MITRE mapping, IOCs, and recommendations.
- Logging — Findings saved to _threats.jsonl.
Supported log sources
Microsoft Defender for Endpoint
- DeviceProcessEvents — process creation and command-line activity
- DeviceNetworkEvents — connections and network events
- DeviceLogonEvents — authentication and logons
- DeviceFileEvents — file operations
- DeviceRegistryEvents — registry changes
- AlertInfo / AlertEvidence — alert metadata and artifacts
Azure Active Directory
- SigninLogs — sign-ins, auth results, risk
- AuditLogs — directory and identity changes
Azure resources
- AzureActivity — control plane (resources, roles)
- AzureNetworkAnalytics_CL — NSG flows via Traffic Analytics
Security considerations
- Do not commit
_keys.py; keep it in.gitignore - Field validation limits what can be queried (reduces exfil risk)
- Table and model allowlists
- Prompting to minimize PII in outputs
Project structure
AI_SOC_Analyst/
├── _main.py # Main entry point
├── EXECUTOR.py # API calls and query execution
├── PROMPT_MANAGEMENT.py # Prompt and tool definitions
├── MODEL_MANAGEMENT.py # Model selection and cost management
├── GUARDRAILS.py # Validation rules
├── UTILITIES.py # Helper functions
├── _keys.py # API keys (DO NOT COMMIT)
├── _threats.jsonl # Logged threat findings
└── README.mdOutput format
Structured JSON (excerpt):
{
"findings": [
{
"title": "Brief title describing the suspicious activity",
"description": "Detailed explanation of why this activity is suspicious",
"mitre": {
"tactic": "e.g., Execution",
"technique": "e.g., T1059",
"sub_technique": "e.g., T1059.001",
"id": "e.g., T1059, T1059.001",
"description": "Description of the MITRE technique/sub-technique"
},
"log_lines": ["Relevant log lines"],
"confidence": "Low | Medium | High",
"recommendations": ["pivot", "create incident", "monitor", "ignore"],
"indicators_of_compromise": ["IOCs found in logs"],
"tags": ["privilege escalation", "persistence", "data exfiltration"],
"notes": "Optional analyst notes"
}
]
}Troubleshooting
Rate limit errors
- Tool warns when approaching limits
- Consider a model with higher limits or narrow time range / fields
No results
- Verify workspace ID and permissions
- Confirm the time range has data and entity names are correct
Authentication errors
- Run
az login - Validate OpenAI key and Log Analytics access
License & disclaimer
Provided as-is for security research and SOC operations. This was a personal project from an internship, built for learning—not actively seeking contributions.
For authorized security operations only. Ensure you have permission before querying log data. Authors are not responsible for misuse.