# SQL Security Scan

### What it does

The SQL Security Agent scans your SQL code and Databricks environments for security vulnerabilities—detecting SQL injection, hardcoded credentials, PII exposure, and missing encryption—then automatically generates secure code fixes.

Think of it as your automated database security expert that finds vulnerabilities and writes the fixes for you.

**You'll get:**

* SQL injection vulnerabilities detected and fixed
* Hardcoded credentials replaced with secrets management
* PII data (SSN, emails, phone numbers) automatically masked
* Insecure joins and queries refactored
* Auto-generated masking functions for production use

⏱️ **Scan time:** 2-5 minutes depending on SQL file size

## Sample Prompts

{% hint style="success" %}
Examples

#### **pre‑deployment‑ddl‑check**

**Prompt:** “Validate all new DDL statements in `migrations/` for dangerous operations before pushing to prod.”

#### **monthly‑audit‑queries**

**Prompt:** “Scan stored procedures and views for SQL injection patterns as part of the monthly security review.”

#### **api‑endpoint‑review**

**Prompt:** “Inspect the SQL used by the payments API for unparameterized queries.”

#### **schema‑change‑risk**

**Prompt:** “Check proposed schema changes in the feature branch for permission‑escalation risks.”

#### **vendor‑sql‑analysis**

**Prompt:** “Analyze third‑party SQL scripts for hard‑coded credentials and unsafe constructs.”<br>
{% endhint %}

### Why use it

**Instead of:**

* Manually reviewing SQL code for security issues
* Writing PII masking functions from scratch
* Guessing at privilege escalation risks
* Spending hours on security audits

**You get:**

* Automated vulnerability detection with fixes
* AI-generated secure code (Claude Sonnet 4)
* 98-99% accurate PII detection and masking
* One-command scan-and-fix workflow
* Compliance-ready security reports

**Impact:**

* Zero SQL injection vulnerabilities
* Automated PII protection (GDPR/HIPAA compliant)
* Secure credentials management via Databricks secrets
* Manual security reviews → automated 7-step process

***

### What it scans

The agent detects and fixes these vulnerability types:

| Vulnerability              | Severity    | Standard | Auto-Fix  |
| -------------------------- | ----------- | -------- | --------- |
| **SQL Injection**          | 🔴 Critical | CWE-89   | ✅ Yes     |
| **Hardcoded Credentials**  | 🔴 Critical | CWE-798  | ✅ Yes     |
| **PII Exposure**           | 🟠 High     | CWE-359  | ✅ Yes     |
| **Missing Encryption**     | 🟠 High     | CWE-311  | ✅ Yes     |
| **Insecure Joins**         | 🟡 Medium   | CWE-1286 | ✅ Yes     |
| **Over-Privileged Access** | 🟠 High     | CWE-269  | ⚠️ Manual |

***

### How to use it

#### Basic scan and fix

Scan and automatically fix vulnerabilities in a SQL file:

```bash
sql-security scan-and-fix --file queries.sql
```

or in natural language:

```
"Scan my SQL file for security vulnerabilities and fix them"
```

***

#### Specific scans

**Detect PII in a table:**

```bash
sql-security detect-pii --table customer_data
```

```
"Scan the customer_data table for PII and show me what needs masking"
```

**Compliance check:**

```bash
sql-security compliance --standard SOC2
```

```
"Run a SOC2 compliance check on my Databricks workspace"
```

**Privilege analysis:**

```bash
sql-security analyze-privileges --user john.doe
```

```
"Check if john.doe has excessive database privileges"
```

***

### What you'll see

#### The 7-step scan-and-fix process

```bash
🔍 SQL Security Agent Starting...

Step 1/7: Vulnerability scan...
✗ Found SQL injection in line 42
✗ Found hardcoded password in line 15
✗ Found PII exposure (SSN) in line 67

Step 2/7: AI fix generation...
✓ Generated secure parameterized query
✓ Generated Databricks secret reference
✓ Generated masking function for SSN

Step 3/7: Safety verification...
✓ Verified query logic preserved
✓ Verified no syntax errors
✓ Verified performance unchanged

Step 4/7: Diff presentation...

Vulnerable Code (Line 42):
  query = "SELECT * FROM users WHERE id = '" + user_id + "'"

Secure Code (Auto-fixed):
  query = "SELECT * FROM users WHERE id = :user_id"
  params = {"user_id": user_id}

Rationale: Prevents SQL injection by using parameterized queries

Step 5/7: User approval...
? Apply this fix? (y/n)

Step 6/7: Applying fixes...
✓ Fixed SQL injection (3 instances)
✓ Replaced hardcoded credentials (1 instance)
✓ Added PII masking (2 instances)
✓ Generated: queries_fixed.sql

Step 7/7: Re-scan verification...
✓ No vulnerabilities found
✓ All security checks passed

📁 Fixed file saved: queries_fixed.sql
📊 Report saved: sql-security-report.html
```

***

### What you get

#### 1. SQL Security Report

**File:** `sql-security-report.html`

```
SQL Security Scan Results
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Overall Risk: HIGH (before fixes)
Files Scanned: 5
Issues Found: 12

Vulnerability Breakdown:
  🔴 Critical: 4
    - SQL Injection: 3
    - Hardcoded Credentials: 1
  
  🟠 High: 5
    - PII Exposure: 3
    - Missing Encryption: 2
  
  🟡 Medium: 3
    - Insecure Joins: 3

Compliance Status:
  SOC2:  ⚠️ 67% (missing encryption at rest)
  GDPR:  ❌ 45% (PII not masked)
  HIPAA: ❌ 40% (no access logs)

Impact Analysis:
  - 3 SQL injection vectors closed
  - 850K sensitive records now masked
  - Credentials moved to secrets vault
```

***

#### 2. Fixed SQL Files

**File:** `queries_fixed.sql`

**Before (Vulnerable):**

```sql
-- SQL Injection vulnerability
SELECT * FROM customers 
WHERE email = '" + user_input + "'

-- Hardcoded credentials
spark.sql("""
  CREATE TABLE users 
  USING org.apache.spark.sql.jdbc
  OPTIONS (
    url 'jdbc:mysql://prod.db:3306',
    user 'admin',
    password 'P@ssw0rd123'  -- CRITICAL: Hardcoded!
  )
""")

-- PII exposure
SELECT 
  customer_name,
  ssn,              -- Exposed!
  email,            -- Exposed!
  credit_card       -- Exposed!
FROM customers
```

**After (Secure):**

```sql
-- Fixed: Parameterized query
SELECT * FROM customers 
WHERE email = :email_param

-- Fixed: Using Databricks secrets
spark.sql(f"""
  CREATE TABLE users 
  USING org.apache.spark.sql.jdbc
  OPTIONS (
    url '{dbutils.secrets.get(scope='prod', key='db-url')}',
    user '{dbutils.secrets.get(scope='prod', key='db-user')}',
    password '{dbutils.secrets.get(scope='prod', key='db-password')}'
  )
""")

-- Fixed: PII masked with functions
SELECT 
  customer_name,
  mask_ssn(ssn) as ssn,           -- Masked: ***-**-1234
  mask_email(email) as email,     -- Masked: j***@example.com
  mask_credit_card(credit_card)   -- Masked: ****-****-****-1234
FROM customers
```

***

#### 3. Auto-Generated Masking Functions

**File:** `pii_masking_functions.sql`

```sql
-- Email masking function
CREATE OR REPLACE FUNCTION mask_email(email STRING)
RETURNS STRING
RETURN CASE 
  WHEN current_user() LIKE '%@company.com' 
    AND current_database() != 'production'
  THEN email  -- Show full email in non-prod for internal users
  ELSE CONCAT(
    SUBSTRING(email, 1, 1), 
    '***@',
    SPLIT(email, '@')[1]
  )  -- Mask: j***@example.com
END;

-- SSN masking function
CREATE OR REPLACE FUNCTION mask_ssn(ssn STRING)
RETURNS STRING
RETURN CASE
  WHEN current_user() IN (SELECT user FROM authorized_users)
  THEN ssn  -- Full SSN for authorized users only
  ELSE CONCAT('***-**-', SUBSTRING(ssn, -4))  -- Mask: ***-**-1234
END;

-- Credit card masking function
CREATE OR REPLACE FUNCTION mask_credit_card(card STRING)
RETURNS STRING
RETURN CONCAT('****-****-****-', SUBSTRING(card, -4));

-- Phone number masking function
CREATE OR REPLACE FUNCTION mask_phone(phone STRING)
RETURNS STRING
RETURN CONCAT('(***) ***-', SUBSTRING(phone, -4));

-- Apply masking to existing table
ALTER TABLE customers 
ADD COLUMN email_masked STRING GENERATED ALWAYS AS (mask_email(email));

ALTER TABLE customers
ADD COLUMN ssn_masked STRING GENERATED ALWAYS AS (mask_ssn(ssn));
```

**Usage:**

```sql
-- Use masked columns in queries
SELECT 
  customer_name,
  email_masked as email,    -- Automatically masked
  ssn_masked as ssn         -- Automatically masked
FROM customers;
```

***

#### 4. Before/After Comparison

**Vulnerable Code:**

```sql
-- Insecure joins (missing ON clause)
SELECT * FROM orders, customers 
WHERE customers.id = orders.customer_id
```

**Secure Code:**

```sql
-- Secure explicit JOIN
SELECT 
  o.*,
  c.name,
  c.email_masked as email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.id
WHERE c.is_active = true;
```

***

### After the scan

#### 1. Review and apply fixes

The agent shows you each fix with rationale:

* Review the before/after code
* Understand why the change improves security
* Approve or reject each fix
* Apply all approved fixes at once

***

#### 2. Deploy masking functions

Use the auto-generated functions in production:

```sql
-- Deploy to production
spark.sql("""
  CREATE DATABASE IF NOT EXISTS security_functions;
""")

-- Upload pii_masking_functions.sql
%run /security_functions/pii_masking_functions.sql

-- Verify functions work
SELECT mask_ssn('123-45-6789');  -- Returns: ***-**-6789
```

***

#### 3. Migrate credentials to secrets

Replace hardcoded credentials with Databricks secrets:

```bash
# Create secret scope
databricks secrets create-scope --scope prod

# Add secrets
databricks secrets put --scope prod --key db-password
databricks secrets put --scope prod --key api-key

# Update code to use secrets
# (Agent provides the updated code)
```

***

#### 4. Update privilege grants

For over-privileged accounts flagged:

```sql
-- Review current privileges
sql-security analyze-privileges --user john.doe

-- Output shows:
-- john.doe has ADMIN on production.* (EXCESSIVE)
-- Recommended: SELECT on production.customers only

-- Apply least privilege
REVOKE ALL PRIVILEGES ON production.* FROM john.doe;
GRANT SELECT ON production.customers TO john.doe;
```

***

#### 5. Re-scan to verify

After applying fixes, verify everything is secure:

```bash
sql-security scan-and-fix --file queries_fixed.sql
```

Expected result: ✅ No vulnerabilities found

***

### Quality benchmarks

Use these standards to measure SQL security:

| Metric                    | Target          | Purpose                   |
| ------------------------- | --------------- | ------------------------- |
| **SQL Injection**         | 0 instances     | Prevent data breaches     |
| **Hardcoded Credentials** | 0 instances     | Secure secrets management |
| **PII Masking Coverage**  | 100%            | Comply with GDPR/HIPAA    |
| **Encryption at Rest**    | Enabled         | Protect sensitive data    |
| **Privilege Escalation**  | No admin grants | Least privilege principle |

**Security levels:**

* ✅ **Secure:** 0 critical, 100% PII masked, secrets managed
* ⚠️ **Needs Work:** 0 critical, partial PII masking
* ❌ **Vulnerable:** Any critical findings, no PII masking

***

### Common issues

**"No vulnerabilities found" but I know there are issues?**

* Check file path is correct
* Verify SQL syntax is valid
* Try specific scans: `detect-pii`, `compliance`

**Auto-fixes breaking query logic?**

* Review the "Safety Verification" step output
* Test fixed queries in non-prod first
* Reject specific fixes and apply manually

**Masking functions not working?**

* Check function syntax: `SELECT mask_email('test@example.com')`
* Verify user permissions to create functions
* Ensure functions are in correct database

**PII detection missing fields?**

* Increase confidence threshold: `--confidence 90`
* Add custom patterns: `--pattern-file custom_pii.yaml`
* Review detection summary for false negatives

**Secrets management errors?**

* Verify Databricks secrets scope exists
* Check secret key names match code references
* Test secret access: `dbutils.secrets.get(scope, key)`

***

### Examples

**Quick SQL security scan:**

```
"Scan my queries.sql file for security vulnerabilities"
```

**Full scan with auto-fix:**

```
"Scan and automatically fix all security issues in my SQL code"
```

**Find PII exposure:**

```
"Detect all PII in my customer_data table and generate masking functions"
```

**Compliance check:**

```
"Run a GDPR compliance check on my Databricks workspace"
```

**Privilege analysis:**

```
"Show me all users with admin access to the production database"
```

**Secure credentials:**

```
"Find all hardcoded credentials in my SQL files and replace them with secrets"
```


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.agents.opsera.ai/devsecops-agents/sql-security-scan.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
