# Architecture Analyze Agent

### What it does

The Architecture Analyze Agent automatically discovers, maps, and documents your entire software architecture—generating visual diagrams, API documentation, cost optimization recommendations, and operational runbooks.

Think of it as your automated principal architect that reverse-engineers your system and creates production-ready documentation.

**You'll get:**

* System architecture diagrams (15+ formats: Mermaid, PlantUML, D2, Draw\.io)
* Complete API documentation with endpoints and data flows
* Cost optimization analysis with savings projections
* Disaster recovery blueprints with RPO/RTO targets
* Production-ready code examples and operational runbooks

⏱️ **Analysis time:** 5-15 minutes depending on codebase size

## Sample Prompts

{% hint style="success" %}
**Examples**

#### **pre‑deploy‑review**

**Prompt:** “Analyze our service mesh and identify any single‑points‑of‑failure before deploying the new microservice.”

#### **auth‑route‑audit**

**Prompt:** “Verify all authentication routes and dataflows to ensure no unsecured endpoints exist.”

#### **high‑latency‑diagnosis**

**Prompt:** “Inspect the architecture for components contributing to the 200 ms latency spikes in the checkout flow.”

#### **scaling‑risk‑assessment**

**Prompt:** “Evaluate our current setup for risks when scaling from 100 to 10 000 concurrent users.”

#### **third‑party‑dependency‑map**

**Prompt**: “Generate a diagram showing external APIs and their expected request volumes for security review.”<br>
{% endhint %}

### Why use it

**Instead of:**

* Spending 30-40 hours manually documenting architecture
* Reverse-engineering systems from outdated diagrams
* Guessing at cost optimization opportunities
* Creating DR plans from scratch

**You get:**

* Automated discovery of your entire tech stack
* Multi-format diagrams ready to edit
* $335K+ average annual cloud savings identified
* Production-ready monitoring and error handling code
* Week-by-week implementation roadmap

**Impact:**

* 30-40 hours saved per project on documentation
* 40-60% cloud cost reduction (average)
* 15-minute RPO / 4-hour RTO for disaster recovery

***

### What it analyzes

The agent performs deep analysis across multiple layers:

#### 1. Technology Stack Discovery

**Scans:** Languages, frameworks, cloud services\
**Finds:** Python versions, React/Node.js, Databricks, AWS/Azure/GCP\
**Example:** Detects "Python 3.11, FastAPI, Databricks Unity Catalog, Delta Lake"

#### 2. Component Mapping

**Analyzes:** Directory structures, config files\
**Finds:** Microservices, APIs, databases, storage layers\
**Example:** Maps package.json, Dockerfile, terraform files to system boundaries

#### 3. Data Flow Analysis

**Reads:** Source code, API calls, data pipelines\
**Finds:** Medallion architecture (Bronze/Silver/Gold), API endpoints, auth flows\
**Example:** Traces data from ingestion → transformation → consumption

#### 4. Security Architecture

**Maps:** Defense-in-depth layers, access controls\
**Finds:** Network isolation, authentication, encryption, RBAC\
**Example:** Documents 8-layer security framework with Private Link setup

#### 5. Cost Optimization

**Analyzes:** Compute resources, storage, network\
**Finds:** Auto-termination gaps, spot instance opportunities, rightsizing needs\
**Example:** Identifies $335K+ annual savings in unused compute

#### 6. Disaster Recovery

**Plans:** Multi-region failover, backup strategies\
**Finds:** Recovery point objectives (RPO), recovery time objectives (RTO)\
**Example:** Creates active-passive setup with 15-min RPO / 4-hour RTO

***

### How to use it

#### Basic analysis

Analyze your current directory:

bash

```bash
opsera-devops-agent:architecture-analyze
```

or in natural language:

```
"Analyze the architecture of this project"
```

***

#### Specific analyses

**Full system with all diagram formats:**

```
"Analyze the entire system and generate diagrams in all formats"
```

**Focus on cost optimization:**

```
"Analyze this architecture and show me cost optimization opportunities"
```

**CI/CD architecture:**

```
"Generate a CI/CD architecture with pipeline recommendations"
```

**Security architecture only:**

```
"Map the security architecture with defense-in-depth layers"
```

**Specific directory:**

```
"Analyze only the ./src directory with API documentation"
```

***

### What you'll see

#### During analysis

bash

```bash
🏗️ Architecture Analyze Agent Starting...

Phase 1/6: Environment discovery...
✓ Detected: Python 3.11, FastAPI, React 18
✓ Cloud services: Databricks, AWS S3, RDS
✓ Infrastructure: Terraform, Docker, Kubernetes

Phase 2/6: Component mapping...
✓ Found 8 microservices
✓ Identified 24 API endpoints
✓ Mapped 3 data layers (Bronze/Silver/Gold)

Phase 3/6: Logic extraction...
✓ Traced data flows across pipelines
✓ Documented authentication sequences
✓ Mapped API dependencies

Phase 4/6: Multi-layer diagramming...
✓ Generated system overview
✓ Created data flow diagrams
✓ Rendered security architecture
✓ Produced 15 diagrams in 4 formats

Phase 5/6: Strategic analysis...
✓ Identified $420K in potential savings
✓ Created week-by-week implementation plan
✓ Analyzed DR requirements (RPO/RTO)

Phase 6/6: Artifact delivery...
✓ Generated 5 comprehensive reports
✓ Created operational runbooks
✓ Produced production-ready code examples

📁 Reports saved to: /Users/opsera/architecture-docs/
```

***

### Reports generated

You'll get 5 comprehensive documentation files:

#### 1. Architecture Documentation

**File:** `architecture-documentation.md`

**Contains:**

* **Technology Stack Analysis:** Complete breakdown of backend, data layers, infrastructure
* **API Specification:** All discovered endpoints with request/response formats
* **Visual Blueprints:** 15+ diagrams including:
  * System overview
  * Data flow (Medallion architecture)
  * Component diagrams
  * Sequence diagrams
  * ER diagrams
* **Multi-Format Export:** Mermaid, PlantUML, D2, Draw\.io XML

**Example output:**

markdown

````markdown
## System Overview Diagram
```mermaid
graph TB
    API[FastAPI Gateway]
    DB[(PostgreSQL)]
    Cache[(Redis)]
    Queue[RabbitMQ]
    
    API --> DB
    API --> Cache
    API --> Queue
```

## Technology Stack
- **Backend:** Python 3.11 (FastAPI, Celery)
- **Frontend:** React 18 (TypeScript, Tailwind)
- **Data:** Databricks (Delta Lake, Unity Catalog)
- **Infrastructure:** AWS (EKS, RDS, S3)

## API Endpoints
- POST /api/v1/users - Create user
- GET /api/v1/users/{id} - Get user details
- POST /api/v1/pipelines/run - Execute data pipeline
````

***

#### 2. CI/CD Architecture

**File:** `cicd-pipeline-architecture.md`

**Contains:**

* **Pipeline Orchestration:** Complete GitHub Actions workflows (600+ lines)
* **Environment Strategy:** Dev → Staging → Prod promotion path
* **Quality Gates:** Automated testing, security scanning, manual approvals
* **Deployment Patterns:** Blue-green, canary, rolling updates

**Example output:**

yaml

```yaml
# GitHub Actions Pipeline
name: Deploy to Production

on:
  push:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - name: Run pytest
      - name: Security scan (TruffleHog, Bandit)
      - name: Code quality (SonarQube)
  
  deploy:
    needs: test
    steps:
      - name: Deploy to EKS
      - name: Run smoke tests
      - name: Health check
```

***

#### 3. Cost Optimization Analysis

**File:** `cost-optimization-analysis.md`

**Contains:**

* **Current Spend Analysis:** Breakdown by service, region, resource type
* **Optimization Opportunities:** Auto-termination, spot instances, rightsizing
* **ROI Projections:** 3-year savings forecast
* **Implementation Roadmap:** Week-by-week plan

**Example output:**

```
Cost Optimization Opportunities
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Current Annual Spend: $840,000
Projected Annual Savings: $504,000 (60%)
Payback Period: 3 months

Quick Wins (Week 1-2):
□ Enable auto-termination on Databricks clusters    [$120K/year]
□ Switch to spot instances for non-prod workloads   [$80K/year]
□ Rightsize RDS instances (t3.large → t3.medium)    [$24K/year]

Medium-term (Week 3-6):
□ Implement data lifecycle policies (S3 → Glacier)  [$60K/year]
□ Optimize SQL warehouse size and concurrency       [$100K/year]
□ Enable compute autoscaling                        [$120K/year]

3-Year ROI Projection:
Year 1: $504K savings - $15K implementation = $489K net
Year 2: $504K savings
Year 3: $504K savings
Total 3-Year Savings: $1,497,000
```

***

#### 4. Disaster Recovery Blueprint

**File:** `disaster-recovery-architecture.md`

**Contains:**

* **Resiliency Targets:** RPO (15 minutes), RTO (4 hours)
* **Failover Procedures:** Active-passive multi-region setup
* **Automated Scripts:** Failover automation code
* **Testing Procedures:** DR drill runbooks

**Example output:**

````
Disaster Recovery Plan
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Targets:
  RPO (Recovery Point Objective):  15 minutes
  RTO (Recovery Time Objective):   4 hours

Architecture:
  Primary Region:   us-east-1 (Active)
  DR Region:        us-west-2 (Passive)
  Replication:      Cross-region automated backups

Failover Steps:
1. Detect primary region failure (CloudWatch alarms)
2. Promote DR database replica to primary
3. Update Route 53 DNS to DR region
4. Validate application health checks
5. Notify stakeholders

Automated Failover Script:
```bash
#!/bin/bash
# Failover to DR region
aws rds promote-read-replica --db-instance dr-database
aws route53 change-resource-record-sets --change-batch file://failover.json
# Verify health
curl https://api.example.com/health
```

DR Testing Schedule:
- Monthly: Backup restore tests
- Quarterly: Full failover drills
- Annual: Complete DR simulation
````

***

#### 5. Production-Ready Code & Operations

**Files:** `production_ready_code_examples.py`, `operational-guide.md`

**Contains:**

* **Hardened Code:** Circuit breakers, exponential backoff, structured logging
* **Operational Runbooks:** Common tasks (scaling, troubleshooting, monitoring)
* **Security Hardening:** Network isolation, RBAC, encryption

**Example output:**

python

```python
# production_ready_code_examples.py

import time
import logging
from functools import wraps

# Circuit breaker pattern
class CircuitBreaker:
    def __init__(self, failure_threshold=5, timeout=60):
        self.failure_count = 0
        self.failure_threshold = failure_threshold
        self.timeout = timeout
        self.last_failure_time = None
        
    def call(self, func):
        if self.failure_count >= self.failure_threshold:
            if time.time() - self.last_failure_time < self.timeout:
                raise Exception("Circuit breaker is OPEN")
            else:
                self.failure_count = 0  # Reset after timeout
        
        try:
            result = func()
            self.failure_count = 0  # Reset on success
            return result
        except Exception as e:
            self.failure_count += 1
            self.last_failure_time = time.time()
            raise

# Exponential backoff with retry
def retry_with_backoff(max_retries=3, base_delay=1):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            for attempt in range(max_retries):
                try:
                    return func(*args, **kwargs)
                except Exception as e:
                    if attempt == max_retries - 1:
                        raise
                    delay = base_delay * (2 ** attempt)
                    logging.warning(f"Retry {attempt + 1}/{max_retries} after {delay}s")
                    time.sleep(delay)
        return wrapper
    return decorator

# Structured JSON logging
import json
from datetime import datetime

class StructuredLogger:
    def log(self, level, message, **kwargs):
        log_entry = {
            "timestamp": datetime.utcnow().isoformat(),
            "level": level,
            "message": message,
            **kwargs
        }
        print(json.dumps(log_entry))

logger = StructuredLogger()
logger.log("INFO", "Pipeline started", pipeline_id="abc123", user="admin")
```

***

### After analysis

#### 1. Review the architecture documentation

Open `architecture-documentation.md` to see:

* Complete system overview
* All discovered APIs and endpoints
* Visual diagrams in multiple formats
* Technology stack breakdown

***

#### 2. Implement cost optimizations

Follow the quick wins in `cost-optimization-analysis.md`:

* Enable auto-termination on compute clusters
* Switch non-prod workloads to spot instances
* Rightsize over-provisioned resources
* Implement data lifecycle policies

**Typical savings:** $300K-$500K annually

***

#### 3. Deploy CI/CD improvements

Use the workflows in `cicd-pipeline-architecture.md`:

* Copy GitHub Actions workflows to `.github/workflows/`
* Set up quality gates (testing, security scanning)
* Configure environment promotion (Dev → Staging → Prod)
* Enable automated deployments

***

#### 4. Harden production systems

Use `production_ready_code_examples.py`:

* Add circuit breakers to external API calls
* Implement exponential backoff with retry logic
* Switch to structured JSON logging
* Add comprehensive error handling

***

#### 5. Prepare disaster recovery

Follow the DR blueprint in `disaster-recovery-architecture.md`:

* Set up multi-region replication
* Configure automated failover scripts
* Schedule quarterly DR drills
* Document runbooks for emergencies

***

### Quality benchmarks

Use these standards to measure architectural quality:

| Metric                     | Target                   | Purpose                      |
| -------------------------- | ------------------------ | ---------------------------- |
| **Documentation Coverage** | 100%                     | All components documented    |
| **Diagram Accuracy**       | Current                  | Reflects actual system state |
| **Cost Efficiency**        | Auto-termination enabled | No wasted compute            |
| **DR Preparedness**        | RPO 15min / RTO 4hr      | Business continuity          |
| **Security Layers**        | 8-layer defense-in-depth | Comprehensive protection     |

**Architecture maturity levels:**

* ✅ **Production Ready:** All targets met, DR tested, costs optimized
* ⚠️ **Needs Hardening:** Documentation complete, DR planned, some cost waste
* ❌ **Early Stage:** Incomplete docs, no DR plan, high cost waste

***

### Common issues

**Analysis taking too long?**

* Start with specific directory: "Analyze only the ./api directory"
* Skip certain analyses: "Analyze architecture but skip cost optimization"
* Larger codebases may take 15-20 minutes

**Diagrams not rendering?**

* Copy Mermaid/PlantUML code to specialized viewers
* Use Draw\.io XML files for visual editing
* Check diagram syntax in generated markdown

**Missing components in diagrams?**

* Ensure all config files are present (package.json, Dockerfile, etc.)
* Check that services are running or have recent activity
* Verify cloud provider credentials for service discovery

**Cost analysis shows no savings?**

* Your infrastructure may already be optimized
* Run analysis on production environment for accurate data
* Check for auto-termination and spot instance usage

**Reports not generated?**

* Check write permissions in output directory
* Verify sufficient disk space
* Look for errors in analysis output

***

### Examples

**Quick architecture overview:**

```
"Analyze this project and show me the system architecture"
```

**Full analysis with all diagrams:**

```
"Generate a complete architecture analysis with all diagram formats and cost optimization"
```

**API documentation:**

```
"Analyze this codebase and document all API endpoints"
```

**Cost optimization focus:**

```
"Analyze my cloud architecture and identify cost savings opportunities"
```

**Security architecture:**

```
"Map the security architecture with defense-in-depth layers"
```

**DR planning:**

```
"Generate a disaster recovery plan with RPO and RTO targets"
```
