
Implementation Guide: Monitor compensation benchmarks and flag roles drifting below market
Step-by-step implementation guide for deploying AI to monitor compensation benchmarks and flag roles drifting below market for HR & Staffing clients.
Hardware Procurement
Agent Host Virtual Machine
$60–75/month MSP cost / $150–200/month suggested resale (bundled with monitoring)
Hosts the n8n or CrewAI agent orchestration runtime, PostgreSQL database, and Docker containers. The B2ms provides burstable compute suitable for periodic batch workloads (weekly/monthly compensation comparisons). 8 GB RAM accommodates the orchestration engine, database, and any local caching of benchmark datasets.
Agent Host Virtual Machine (Cloud — AWS Alternative)
$55–70/month MSP cost / $150–200/month suggested resale
Alternative to Azure for MSPs with existing AWS infrastructure. The t3.medium provides equivalent burstable compute for the agent workload. Pair with a separate RDS db.t3.micro instance ($25–40/month) for PostgreSQL if not running the database on the same instance.
On-Premise Micro Server (Alternative for Air-Gapped or On-Prem Requirements)
Intel NUC 13 Pro Arena Canyon
$550–700 one-time MSP cost / $900–1,100 suggested resale + $100/month managed services
For clients requiring on-premise hosting due to data residency or security policies. The NUC 13 Pro provides ample compute for Docker-based agent deployment, PostgreSQL, and n8n. Compact form factor fits in any server closet or desktop. Only recommended when cloud deployment is not an option.
On-Premise Micro Server (Dell Alternative)
Dell OptiPlex Micro 7020
$650–850 one-time MSP cost / $1,000–1,300 suggested resale + $100/month managed services
Enterprise-grade alternative to Intel NUC for on-premise deployments. Dell ProSupport warranty available. Same use case as NUC 13 Pro but with Dell's enterprise service ecosystem for clients who require it.
Software Procurement
Salary.com CompAnalyst
$3,850/year base tier MSP cost / $6,000–8,000/year suggested resale
Primary compensation benchmarking data source. Provides a library of 15,000+ job titles with HR-reported salary data, AI-powered auto-matching of client job titles to survey benchmarks with 95% accuracy, and API access for programmatic data retrieval. Best price-to-value ratio for SMB clients.
Pave (Alternative / Free Tier for Small Clients)
Free for <200 employees; $23,750/year median list for 250+ employees / resale at $30,000–38,000/year
Alternative compensation data provider, ideal for startup and SMB clients under 200 employees where the free tier applies. Provides real-time benchmarking via direct HRIS/ATS integrations. For larger clients, the paid tier is significantly more expensive than CompAnalyst but offers deeper real-time data.
n8n Self-Hosted (Community Edition)
$0/month (self-hosted) / $150–300/month suggested resale as 'managed automation platform'
Primary agent orchestration and workflow automation platform. Provides visual workflow builder with native AI agent nodes, 400+ built-in integrations, webhook triggers, cron scheduling, and HTTP request nodes for API calls. Self-hosted on the agent VM via Docker. The MSP controls the entire stack with zero licensing cost.
n8n Cloud (Alternative — No Self-Hosting)
$50/month Pro tier (10,000 executions) MSP cost / $200–400/month suggested resale
Alternative to self-hosting for MSPs that prefer a fully managed automation platform. Reduces DevOps overhead but introduces a recurring software cost and less control over data residency.
OpenAI API (GPT-5.4 mini)
$15–50/month typical usage for this workload / included in managed service fee
LLM reasoning engine for the autonomous agent. GPT-5.4 mini at $0.15/1M input tokens and $0.60/1M output tokens provides the best cost-to-capability ratio for batch compensation analysis. Used for job title normalization, drift severity assessment, natural-language report generation, and recommendation synthesis. Weekly batch runs for 50–500 roles consume modest token volumes.
Same token rates as OpenAI + Azure infrastructure costs / included in managed service fee
Alternative to direct OpenAI API for clients requiring SOC 2 compliant hosting, data residency guarantees, and assurance that compensation data is not used for model training. Recommended for clients in regulated industries or with strict data governance policies.
Unified.to (HRIS Unified API)
$0–100/month depending on volume MSP cost / $150–400/month suggested resale
Unified HRIS API connector that normalizes employee and compensation data across BambooHR, ADP, Gusto, Workday, Rippling, Personio, HiBob, and 40+ other HRIS systems into a single data model. Eliminates the need to build custom integrations per client HRIS. Critical for MSPs serving multiple clients with different HR systems.
Merge.dev (Alternative Unified API)
Free tier available; paid from $650/month for production / $900–1,200/month suggested resale
Alternative to Unified.to with broader ATS and payroll coverage. Better option for staffing agencies that need both HRIS and ATS data (Bullhorn, Greenhouse, Lever, iCIMS). Higher cost but more comprehensive connector library.
PostgreSQL 15+
$0 (self-hosted on agent VM) or $25–40/month for managed instance (Azure/AWS)
Primary database for storing historical compensation snapshots, agent run logs, drift tracking over time, role mappings, and alert history. Enables trend analysis and compliance audit trails.
Docker Engine & Docker Compose
$0 (open source)
Container runtime for deploying n8n, PostgreSQL, and any custom Python agent components as isolated, reproducible containers on the agent VM.
Retool (Optional — Client Dashboard)
$25–50/user/month suggested resale
Optional low-code dashboard builder for creating a branded compensation monitoring portal for the client's HR team. Connects directly to PostgreSQL for real-time drift visualization, historical trends, and role-by-role analysis. Can be white-labeled with MSP branding.
Prerequisites
- Client has an active HRIS or payroll system with API access enabled (BambooHR, ADP Workforce Now, Gusto, Rippling, Paylocity, Workday, or HiBob). Confirm the client's current plan includes API access — some vendors restrict API to enterprise tiers.
- Client can provide a complete list of active job titles, departments, and current base compensation (salary or hourly rate) for all roles to be monitored. This may come from the HRIS export or a manual spreadsheet for initial calibration.
- Client HR leadership has defined what constitutes 'market drift' — recommended starting threshold is a role paying 10% or more below the 50th percentile market rate, but this must be confirmed with the client's compensation philosophy (e.g., targeting 50th, 60th, or 75th percentile).
- Client has Microsoft 365 (Teams + Outlook) or Google Workspace (Gmail) or Slack for receiving automated alerts and reports. Confirm which channel HR leadership prefers for notifications.
- The MSP has an active OpenAI API account with billing configured (or Azure OpenAI resource provisioned) with at least $50 in prepaid credits for initial development and testing.
- The MSP has procured or activated an account with the chosen compensation data provider (Salary.com CompAnalyst or Pave) and has confirmed API access or data export capabilities.
- Outbound HTTPS (port 443) is allowed from the agent host to: api.openai.com, the HRIS API endpoint, the compensation data API endpoint, and notification service endpoints (smtp.office365.com, hooks.slack.com, or equivalent).
- If deploying to cloud: An Azure subscription or AWS account is provisioned with appropriate resource quotas. If deploying on-premise: The NUC or OptiPlex is racked, powered, networked, and accessible via SSH with a static internal IP.
- The MSP has designated a Python developer (intermediate level) familiar with REST APIs and Docker, and has allocated 60–100 hours of development time across the 8–14 week implementation window.
- Client has identified a primary HR stakeholder who will serve as the point of contact for job title mapping, threshold calibration, and acceptance testing. This person must have authority to view compensation data for all monitored roles.
Installation Steps
Step 1: Provision and Configure the Agent Host VM
Create the cloud VM that will host the entire agent stack (n8n, PostgreSQL, custom Python scripts). This single VM runs all components in Docker containers for simplified management.
az group create --name rg-comp-agent --location eastus2
az vm create --resource-group rg-comp-agent --name vm-comp-agent --image Ubuntu2204 --size Standard_B2ms --admin-username mspadmin --generate-ssh-keys --os-disk-size-gb 64 --public-ip-sku Standard
# Open port 443 outbound (default allowed) and 5678 for n8n UI (restrict to MSP IP)
az vm open-port --resource-group rg-comp-agent --name vm-comp-agent --port 5678 --priority 1010
# SSH into the VM
ssh mspadmin@<public-ip>
# Update system and install Docker
sudo apt update && sudo apt upgrade -y
curl -fsSL https://get.docker.com -o get-docker.sh && sudo sh get-docker.sh
sudo usermod -aG docker mspadmin
sudo apt install -y docker-compose-plugin
# Verify Docker
docker --version
docker compose versionFor AWS, use equivalent commands: aws ec2 run-instances with t3.medium AMI. For on-premise NUC/OptiPlex, install Ubuntu Server 22.04 LTS, then run the Docker installation commands. Restrict port 5678 (n8n UI) to the MSP's management IP range only — never expose to the public internet without authentication.
Step 2: Deploy n8n and PostgreSQL via Docker Compose
Create a Docker Compose stack that runs n8n (workflow orchestration) and PostgreSQL (data storage) as linked containers. n8n will serve as the primary agent orchestration platform with a visual workflow editor.
# Create project directory
mkdir -p /opt/comp-agent && cd /opt/comp-agent
# Create Docker Compose file
cat > docker-compose.yml << 'EOF'
version: '3.8'
services:
postgres:
image: postgres:15-alpine
container_name: comp-agent-db
restart: unless-stopped
environment:
POSTGRES_USER: compagent
POSTGRES_PASSWORD: ${DB_PASSWORD}
POSTGRES_DB: comp_benchmarks
volumes:
- pgdata:/var/lib/postgresql/data
- ./init-db.sql:/docker-entrypoint-initdb.d/init-db.sql
ports:
- '127.0.0.1:5432:5432'
healthcheck:
test: ['CMD-SHELL', 'pg_isready -U compagent']
interval: 10s
timeout: 5s
retries: 5
n8n:
image: n8nio/n8n:latest
container_name: comp-agent-n8n
restart: unless-stopped
environment:
DB_TYPE: postgresdb
DB_POSTGRESDB_HOST: postgres
DB_POSTGRESDB_PORT: 5432
DB_POSTGRESDB_DATABASE: comp_benchmarks
DB_POSTGRESDB_USER: compagent
DB_POSTGRESDB_PASSWORD: ${DB_PASSWORD}
N8N_BASIC_AUTH_ACTIVE: 'true'
N8N_BASIC_AUTH_USER: ${N8N_USER}
N8N_BASIC_AUTH_PASSWORD: ${N8N_PASSWORD}
N8N_HOST: ${N8N_HOST}
N8N_PROTOCOL: https
WEBHOOK_URL: https://${N8N_HOST}/
GENERIC_TIMEZONE: America/New_York
N8N_ENCRYPTION_KEY: ${ENCRYPTION_KEY}
volumes:
- n8ndata:/home/node/.n8n
ports:
- '5678:5678'
depends_on:
postgres:
condition: service_healthy
volumes:
pgdata:
n8ndata:
EOF
# Create environment file
cat > .env << 'EOF'
DB_PASSWORD=<generate-strong-password-here>
N8N_USER=mspadmin
N8N_PASSWORD=<generate-strong-password-here>
N8N_HOST=comp-agent.yourmsp.com
ENCRYPTION_KEY=<generate-32-char-random-string>
EOF
# Secure the env file
chmod 600 .envGenerate strong passwords using: openssl rand -base64 24. The ENCRYPTION_KEY is critical — n8n uses it to encrypt stored credentials. Back it up securely; losing it means re-entering all API credentials. For production, place an Nginx reverse proxy with Let's Encrypt SSL in front of n8n rather than exposing port 5678 directly.
Step 3: Initialize the Compensation Database Schema
Create the PostgreSQL tables needed to store employee compensation records, market benchmark data, drift analysis results, and agent run history. This schema supports historical tracking for trend analysis and compliance audit trails.
cat > /opt/comp-agent/init-db.sql << 'EOSQL'
-- Compensation Agent Database Schema
CREATE TABLE IF NOT EXISTS roles (
id SERIAL PRIMARY KEY,
external_hris_id VARCHAR(255) UNIQUE,
job_title VARCHAR(500) NOT NULL,
normalized_title VARCHAR(500),
department VARCHAR(255),
location_city VARCHAR(255),
location_state VARCHAR(100),
location_country VARCHAR(100) DEFAULT 'US',
employment_type VARCHAR(50) DEFAULT 'full-time',
experience_level VARCHAR(50),
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS compensation_snapshots (
id SERIAL PRIMARY KEY,
role_id INTEGER REFERENCES roles(id),
snapshot_date DATE NOT NULL,
current_base_salary NUMERIC(12,2),
current_total_comp NUMERIC(12,2),
salary_currency VARCHAR(10) DEFAULT 'USD',
source VARCHAR(100) DEFAULT 'hris',
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS market_benchmarks (
id SERIAL PRIMARY KEY,
role_id INTEGER REFERENCES roles(id),
benchmark_date DATE NOT NULL,
market_p25 NUMERIC(12,2),
market_p50 NUMERIC(12,2),
market_p75 NUMERIC(12,2),
market_p90 NUMERIC(12,2),
data_source VARCHAR(100),
geography VARCHAR(255),
industry VARCHAR(255),
sample_size INTEGER,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS drift_analyses (
id SERIAL PRIMARY KEY,
role_id INTEGER REFERENCES roles(id),
analysis_date DATE NOT NULL,
current_salary NUMERIC(12,2),
market_p50 NUMERIC(12,2),
drift_percentage NUMERIC(6,2),
drift_severity VARCHAR(20),
target_percentile VARCHAR(10) DEFAULT 'p50',
recommendation TEXT,
llm_reasoning TEXT,
alert_sent BOOLEAN DEFAULT FALSE,
alert_sent_at TIMESTAMP,
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS agent_runs (
id SERIAL PRIMARY KEY,
run_date TIMESTAMP DEFAULT NOW(),
run_type VARCHAR(50),
roles_analyzed INTEGER,
alerts_generated INTEGER,
errors TEXT,
duration_seconds INTEGER,
status VARCHAR(50),
created_at TIMESTAMP DEFAULT NOW()
);
CREATE TABLE IF NOT EXISTS role_mappings (
id SERIAL PRIMARY KEY,
internal_title VARCHAR(500),
benchmark_title VARCHAR(500),
benchmark_code VARCHAR(100),
confidence_score NUMERIC(4,2),
mapping_method VARCHAR(50),
verified_by_hr BOOLEAN DEFAULT FALSE,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW()
);
CREATE INDEX idx_drift_date ON drift_analyses(analysis_date);
CREATE INDEX idx_drift_severity ON drift_analyses(drift_severity);
CREATE INDEX idx_comp_snapshot_date ON compensation_snapshots(snapshot_date);
CREATE INDEX idx_benchmark_date ON market_benchmarks(benchmark_date);
EOSQLcd /opt/comp-agent && docker compose up -ddocker compose psdocker exec -it comp-agent-db psql -U compagent -d comp_benchmarks -c '\dt'The schema includes a role_mappings table specifically for the critical job title normalization challenge. The verified_by_hr column allows the client's HR contact to confirm or correct LLM-generated title mappings. Always keep drift_analyses records for compliance — never DELETE; the recommended retention period is 7 years.
Step 4: Configure HRIS Integration via Unified.to
Set up the unified HRIS API connector to pull employee compensation data from the client's HR system. Unified.to provides a single API that works across BambooHR, ADP, Gusto, Rippling, and 40+ other HRIS platforms, eliminating per-client custom integration work.
curl -X GET 'https://api.unified.to/hris/employees?connection_id=YOUR_CONNECTION_ID&fields=compensations,title,department,location' \
-H 'Authorization: Bearer YOUR_UNIFIED_API_KEY' \
-H 'Content-Type: application/json' | python3 -m json.tool{
"results": [
{
"id": "emp_123",
"name": "Jane Doe",
"title": "Software Engineer II",
"department": "Engineering",
"location": { "city": "Austin", "state": "TX", "country": "US" },
"compensations": [
{ "type": "salary", "amount": 125000, "currency": "USD", "frequency": "annual" }
]
}
]
}The client's HR admin will need to authorize the Unified.to connection to their HRIS. For BambooHR this is an API key; for ADP it is OAuth 2.0. Walk the client through this in a screen-share session. Unified.to handles the OAuth flow. For clients whose HRIS is not supported by Unified.to, fall back to direct API integration or CSV import as a temporary measure.
Step 5: Configure Compensation Data Provider API
Set up the connection to Salary.com CompAnalyst (or Pave) for market benchmark data retrieval. This is the source of truth for what the market pays for each role.
curl -X GET 'https://api.salary.com/v1/companalyst/market-data?job_title=Software+Engineer+II&location=Austin,TX&industry=Technology' \
-H 'Authorization: Bearer YOUR_COMPANALYST_API_KEY' \
-H 'Accept: application/json' | python3 -m json.toolIf CompAnalyst does not offer direct API, use Lightcast (formerly Burning Glass) API for programmatic access. Sign up at https://lightcast.io/contact to request API access. Lightcast provides wage data by SOC code with geographic granularity.
Salary.com CompAnalyst's API availability depends on the subscription tier — confirm API access during procurement. If API access is not available, the fallback approach is: (1) export benchmark data as CSV monthly from CompAnalyst's web interface, (2) upload to the agent's PostgreSQL database via a simple n8n file-processing workflow. For Pave free-tier clients, Pave may not offer a formal API — use their dashboard exports or HRIS-integrated benchmark views. Lightcast is the most API-friendly option for programmatic access.
Step 6: Configure OpenAI API Credentials and Test LLM Connectivity
Set up the OpenAI API key (or Azure OpenAI endpoint) that the agent will use for job title normalization, drift analysis reasoning, and natural-language report generation.
curl https://api.openai.com/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer YOUR_OPENAI_API_KEY' \
-d '{
"model": "gpt-5.4-mini",
"messages": [{
"role": "system",
"content": "You are a compensation analyst. Given a job title, normalize it to a standard market title and suggest a SOC code."
}, {
"role": "user",
"content": "Normalize this job title: Sr. Full Stack Dev (Backend Heavy) in Austin, TX"
}],
"temperature": 0.2
}'Expected response: Something like { "normalized_title": "Senior Software Engineer", "soc_code": "15-1252" }
Set a hard monthly spend limit of $100 on the OpenAI account to prevent unexpected costs during development. For production, $50/month is typical for this workload. If the client requires Azure OpenAI, provision an Azure OpenAI resource in the same region as the agent VM, deploy GPT-5.4 mini model, and use the Azure-specific endpoint URL and API key format. Temperature of 0.2 is recommended for consistent, deterministic outputs in compensation analysis.
Step 7: Set Up Notification Channels
Configure the alert delivery mechanisms. The agent will send notifications when roles are flagged as drifting below market thresholds. Set up Microsoft Teams webhook, Slack webhook, and/or SMTP email as appropriate for the client.
curl -X POST 'YOUR_TEAMS_WEBHOOK_URL' \
-H 'Content-Type: application/json' \
-d '{
"@type": "MessageCard",
"summary": "Test Alert",
"themeColor": "FF0000",
"title": "⚠️ Compensation Drift Alert — TEST",
"sections": [{
"facts": [
{ "name": "Role", "value": "Software Engineer II" },
{ "name": "Current Salary", "value": "$105,000" },
{ "name": "Market P50", "value": "$125,000" },
{ "name": "Drift", "value": "-16.0%" }
],
"text": "This role is 16% below the 50th percentile market rate for Austin, TX."
}]
}'Microsoft Teams is transitioning away from Office 365 Connectors to Workflows (Power Automate) for incoming webhooks. For new setups in 2025, consider using a Teams Workflow webhook instead. Store all webhook URLs in n8n credentials, never hardcode them in workflows. Set up a dedicated 'Compensation Alerts' channel in the client's Teams/Slack to keep alerts organized and accessible to only HR leadership.
Step 8: Build the Core n8n Agent Workflow — HRIS Data Pull
Create the first n8n workflow that runs on a weekly schedule to pull current employee compensation data from the client's HRIS via the Unified.to API, normalize it, and store it in PostgreSQL as compensation snapshots.
The initial run may take longer if the client has 500+ employees. The Unified.to API supports pagination — implement pagination handling in the Code node for large organizations. Run the workflow manually the first time to verify data quality before activating the weekly schedule. Review the extracted compensation data with the client's HR contact to confirm accuracy.
Step 9: Build the Market Benchmark Retrieval Workflow
Create a second n8n workflow that retrieves current market compensation data for each monitored role from the compensation data provider and stores the benchmarks in PostgreSQL. This runs after the HRIS data pull or can be triggered independently.
If the compensation data provider does not have a direct API, replace Node 4 with a File Read node that imports a monthly CSV export. The MSP can schedule a monthly task to upload the latest CompAnalyst export to the agent VM at /opt/comp-agent/data/benchmarks/. Rate limit API calls to avoid being throttled — the Split In Batches node with 5 items and a 2-second wait node between batches is recommended.
Step 10: Build the Core Drift Analysis Agent Workflow
Create the primary autonomous agent workflow that compares current compensation against market benchmarks, uses GPT-5.4 mini to assess drift severity and generate recommendations, stores results, and triggers alerts for roles below threshold.
The three workflows (HRIS Pull → Market Benchmarks → Drift Analyzer) are chained by schedule timing: 6 AM → 7 AM → 8 AM. For more reliable chaining, use n8n's Execute Workflow node to call them sequentially from a master workflow. The LLM call in Node 5 should include context about the client's compensation philosophy, location-based adjustments, and any known planned raises. Keep prompts specific and structured — see the custom_ai_components section for exact prompts.
Step 11: Implement Job Title Normalization with LLM
Build the job title normalization component — the most technically challenging part of the implementation. Internal job titles (e.g., 'Sr. Full Stack Dev (Backend Heavy)') must be mapped to standardized market benchmark titles (e.g., 'Senior Software Engineer') for accurate comparison. Use GPT-5.4 mini for initial mapping, then store verified mappings for reuse.
Job title normalization is the highest-risk step for accuracy. Always require human verification (verified_by_hr = true) before using a mapping for drift analysis. Start with high-confidence mappings (>0.85) running automatically, and route lower-confidence mappings to HR for review. Over time, the role_mappings table becomes a reusable knowledge base that reduces LLM dependency. For staffing agencies with Bullhorn, job titles are often more standardized — normalization may be simpler.
Step 12: Set Up Nginx Reverse Proxy with SSL
Secure the n8n web interface with HTTPS using Nginx as a reverse proxy and Let's Encrypt for free SSL certificates. This is critical because n8n will contain sensitive compensation data and API credentials.
# Install Nginx and Certbot
sudo apt install -y nginx certbot python3-certbot-nginx
# Create Nginx config
sudo cat > /etc/nginx/sites-available/comp-agent << 'EOF'
server {
listen 80;
server_name comp-agent.yourmsp.com;
location / {
proxy_pass http://127.0.0.1:5678;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection 'upgrade';
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_read_timeout 300s;
chunked_transfer_encoding off;
proxy_buffering off;
}
}
EOF
# Enable the site
sudo ln -s /etc/nginx/sites-available/comp-agent /etc/nginx/sites-enabled/
sudo rm -f /etc/nginx/sites-enabled/default
sudo nginx -t && sudo systemctl reload nginx
# Get SSL certificate
sudo certbot --nginx -d comp-agent.yourmsp.com --non-interactive --agree-tos -m admin@yourmsp.com
# Verify auto-renewal
sudo certbot renew --dry-run
# Restrict n8n to only accept connections from localhost (via Nginx)
# Edit docker-compose.yml: change n8n ports from '5678:5678' to '127.0.0.1:5678:5678'
cd /opt/comp-agent
sed -i "s/'5678:5678'/'127.0.0.1:5678:5678'/" docker-compose.yml
docker compose up -dThe DNS A record for comp-agent.yourmsp.com must point to the VM's public IP before running certbot. For MSPs managing multiple clients, use a subdomain pattern: clientname-comp.yourmsp.com. Consider adding IP-based access control in Nginx (allow MSP office IP; allow client HR team IPs; deny all) for additional security. Certbot auto-renewal runs via systemd timer — verify it works.
Step 13: Configure Automated Backups
Set up daily automated backups of the PostgreSQL database containing compensation data, drift analyses, and role mappings. This data is critical for compliance audit trails and must be retained per company policy (typically 7 years for compensation records).
# Create backup script
sudo cat > /opt/comp-agent/backup.sh << 'EOF'
#!/bin/bash
set -e
BACKUP_DIR=/opt/comp-agent/backups
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
RETENTION_DAYS=90
mkdir -p $BACKUP_DIR
# Dump PostgreSQL
docker exec comp-agent-db pg_dump -U compagent -Fc comp_benchmarks > $BACKUP_DIR/comp_benchmarks_$TIMESTAMP.dump
# Compress
gzip $BACKUP_DIR/comp_benchmarks_$TIMESTAMP.dump
# Upload to Azure Blob Storage (or AWS S3)
# az storage blob upload --account-name compagentbackups --container-name backups \
# --file $BACKUP_DIR/comp_benchmarks_$TIMESTAMP.dump.gz \
# --name $TIMESTAMP/comp_benchmarks.dump.gz
# Clean up local backups older than retention
find $BACKUP_DIR -name '*.dump.gz' -mtime +$RETENTION_DAYS -delete
echo "Backup completed: comp_benchmarks_$TIMESTAMP.dump.gz"
EOF
chmod +x /opt/comp-agent/backup.sh
# Schedule daily backup at 2 AM
(crontab -l 2>/dev/null; echo '0 2 * * * /opt/comp-agent/backup.sh >> /opt/comp-agent/backups/backup.log 2>&1') | crontab -
# Test backup
/opt/comp-agent/backup.shUncomment the Azure Blob or AWS S3 upload line for off-site backup. For compliance, ensure backups are encrypted — use Azure Blob encryption at rest (enabled by default) or add gpg encryption before upload. Test restore procedure: docker exec -i comp-agent-db pg_restore -U compagent -d comp_benchmarks < backup.dump.gz. Store a copy of the backup script in the MSP's documentation system.
Step 14: Deploy Optional Client-Facing Dashboard
Deploy a Retool or Streamlit dashboard that gives the client's HR team a visual interface to view drift analysis results, historical trends, verify job title mappings, and configure alert thresholds — without needing direct database or n8n access.
Option A: Retool (Recommended for non-technical HR teams)
docker exec -it comp-agent-db psql -U compagent -d comp_benchmarks -c "
CREATE USER retool_readonly WITH PASSWORD 'strong-password-here';
GRANT CONNECT ON DATABASE comp_benchmarks TO retool_readonly;
GRANT USAGE ON SCHEMA public TO retool_readonly;
GRANT SELECT ON ALL TABLES IN SCHEMA public TO retool_readonly;
ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT ON TABLES TO retool_readonly;
"Option B: Streamlit (self-hosted, fully controlled)
pip install streamlit psycopg2-binary plotly pandasSee custom_ai_components for Streamlit dashboard code.
If using Retool Cloud, the PostgreSQL database must be accessible from Retool's IP ranges. The safest approach is to use Retool's SSH tunnel feature with a dedicated SSH key on the agent VM. NEVER expose PostgreSQL port 5432 to the public internet. For the Streamlit option, deploy it as an additional Docker container in the compose stack on port 8501 behind the same Nginx reverse proxy. Retool is faster to build but adds a SaaS dependency; Streamlit is free and self-hosted but requires more development effort.
Custom AI Components
HRIS Data Transformer
Type: integration
n8n Code node that transforms raw HRIS employee data from Unified.to's API response into the normalized format expected by the PostgreSQL roles and compensation_snapshots tables. Handles edge cases like missing compensation data, multiple compensation records per employee, hourly-to-annual conversion, and non-USD currencies.
Implementation
// Place in Node 3 of the 'Comp Agent — HRIS Data Pull' workflow
// n8n Code Node — JavaScript
// Place this in Node 3 of the 'Comp Agent — HRIS Data Pull' workflow
const employees = $input.all();
const today = new Date().toISOString().split('T')[0];
const results = [];
for (const item of employees) {
const emp = item.json;
// Skip employees without compensation data
if (!emp.compensations || emp.compensations.length === 0) continue;
// Get the primary compensation (most recent or highest)
const primaryComp = emp.compensations
.filter(c => c.type === 'salary' || c.type === 'base')
.sort((a, b) => (b.amount || 0) - (a.amount || 0))[0];
if (!primaryComp || !primaryComp.amount) continue;
// Normalize to annual salary
let annualSalary = primaryComp.amount;
const freq = (primaryComp.frequency || 'annual').toLowerCase();
if (freq === 'hourly') annualSalary = primaryComp.amount * 2080;
else if (freq === 'monthly') annualSalary = primaryComp.amount * 12;
else if (freq === 'biweekly') annualSalary = primaryComp.amount * 26;
else if (freq === 'weekly') annualSalary = primaryComp.amount * 52;
// Extract location
const location = emp.location || {};
const city = location.city || 'Unknown';
const state = location.state || location.region || 'Unknown';
const country = location.country || 'US';
// Determine experience level from title heuristics
const title = (emp.title || '').toLowerCase();
let experienceLevel = 'mid';
if (title.includes('junior') || title.includes('jr') || title.includes('entry') || title.includes('associate')) experienceLevel = 'junior';
else if (title.includes('senior') || title.includes('sr') || title.includes('lead') || title.includes('principal') || title.includes('staff')) experienceLevel = 'senior';
else if (title.includes('director') || title.includes('vp') || title.includes('chief') || title.includes('head of')) experienceLevel = 'executive';
else if (title.includes('manager') || title.includes('mgr')) experienceLevel = 'manager';
results.push({
json: {
external_hris_id: emp.id,
job_title: emp.title || 'Unknown Title',
department: emp.department || 'Unknown',
location_city: city,
location_state: state,
location_country: country,
employment_type: emp.employment_type || 'full-time',
experience_level: experienceLevel,
snapshot_date: today,
current_base_salary: Math.round(annualSalary * 100) / 100,
salary_currency: primaryComp.currency || 'USD'
}
});
}
return results;Job Title Normalization Prompt
Type: prompt
System and user prompt templates for the LLM-powered job title normalization step. This is the most critical AI component — it maps internal job titles (which are often non-standard, abbreviated, or company-specific) to standardized market benchmark titles that compensation data providers recognize. Uses structured JSON output for reliable parsing.
Implementation
System Prompt
User Prompt Template
// Example 1
// Input: Internal Title: Sr. Full Stack Dev (Backend Heavy), Department: Engineering, Location: Austin, TX, Experience Level: senior
{
"normalized_title": "Senior Software Engineer",
"soc_code": "15-1252",
"confidence_score": 0.92,
"reasoning": "Full Stack Dev with backend emphasis maps to Senior Software Engineer. The 'Sr.' prefix confirms senior level. SOC 15-1252 covers Software Developers and Programmers.",
"needs_human_review": false,
"alternative_titles": ["Senior Backend Developer", "Senior Full Stack Developer"]
}
// Example 2
// Input: Internal Title: People Ops Ninja, Department: Human Resources, Location: San Francisco, CA, Experience Level: mid
{
"normalized_title": "Human Resources Generalist",
"soc_code": "13-1071",
"confidence_score": 0.72,
"reasoning": "'People Ops' maps to HR operations. 'Ninja' is a non-standard modifier that doesn't indicate seniority. Most likely an HR Generalist role. Low confidence due to ambiguous title.",
"needs_human_review": true,
"alternative_titles": ["HR Coordinator", "HR Business Partner", "People Operations Specialist"]
}Compensation Drift Analysis Agent
Type: agent
The core autonomous agent that performs weekly compensation drift analysis. Implemented as an n8n workflow with an embedded AI Agent node. It fetches the latest compensation snapshots and market benchmarks from PostgreSQL, calculates drift percentages, classifies severity, uses GPT-5.4 mini to generate contextual analysis and recommendations, and routes alerts to the appropriate channels based on severity.
Implementation
TRIGGER: Cron schedule — 0 8 * * 1 (Monday 8 AM)SELECT
r.id AS role_id,
r.job_title,
r.normalized_title,
r.department,
r.location_city,
r.location_state,
r.experience_level,
cs.current_base_salary,
cs.snapshot_date,
mb.market_p25,
mb.market_p50,
mb.market_p75,
mb.market_p90,
mb.data_source AS benchmark_source,
mb.sample_size,
-- Previous drift for trend
prev.drift_percentage AS previous_drift
FROM roles r
INNER JOIN compensation_snapshots cs ON r.id = cs.role_id
AND cs.snapshot_date = (SELECT MAX(snapshot_date) FROM compensation_snapshots WHERE role_id = r.id)
INNER JOIN market_benchmarks mb ON r.id = mb.role_id
AND mb.benchmark_date = (SELECT MAX(benchmark_date) FROM market_benchmarks WHERE role_id = r.id)
INNER JOIN role_mappings rm ON r.job_title = rm.internal_title
AND (rm.verified_by_hr = true OR rm.confidence_score >= 0.85)
LEFT JOIN LATERAL (
SELECT drift_percentage
FROM drift_analyses
WHERE role_id = r.id
ORDER BY analysis_date DESC
LIMIT 1
) prev ON true
WHERE cs.current_base_salary IS NOT NULL
AND mb.market_p50 IS NOT NULL
AND mb.market_p50 > 0;// CODE: Calculate Drift Metrics (configure node with 2 outputs)
const roles = $input.all();
const today = new Date().toISOString().split('T')[0];
const flaggedRoles = [];
const allRoles = [];
for (const item of roles) {
const r = item.json;
const salary = parseFloat(r.current_base_salary);
const p50 = parseFloat(r.market_p50);
if (!salary || !p50 || p50 === 0) continue;
const driftPct = ((salary - p50) / p50) * 100;
const roundedDrift = Math.round(driftPct * 100) / 100;
let severity = 'OK';
if (roundedDrift <= -15) severity = 'CRITICAL';
else if (roundedDrift <= -10) severity = 'WARNING';
else if (roundedDrift <= -5) severity = 'WATCH';
// Detect worsening trend
const prevDrift = r.previous_drift ? parseFloat(r.previous_drift) : null;
const trendDirection = prevDrift !== null ?
(roundedDrift < prevDrift ? 'worsening' : roundedDrift > prevDrift ? 'improving' : 'stable') : 'new';
const roleData = {
...r,
drift_percentage: roundedDrift,
drift_severity: severity,
trend_direction: trendDirection,
analysis_date: today,
salary_vs_p25: Math.round(((salary - parseFloat(r.market_p25)) / parseFloat(r.market_p25)) * 10000) / 100,
salary_vs_p75: Math.round(((salary - parseFloat(r.market_p75)) / parseFloat(r.market_p75)) * 10000) / 100
};
allRoles.push({ json: roleData });
if (severity === 'CRITICAL' || severity === 'WARNING') {
flaggedRoles.push({ json: roleData });
}
}
// Output 0: All roles (for logging)
// Output 1: Flagged roles only (for LLM analysis + alerts)
return [allRoles, flaggedRoles];NODE 4 — SPLIT IN BATCHES: Process flagged roles 3 at a time.
- Model: gpt-5.4-mini
- Temperature: 0.2
- Max Tokens: 600
NODE 5 — OPENAI: System Prompt
NODE 5 — OPENAI: User Prompt
const items = $input.all();
const results = [];
for (const item of items) {
let llmResponse;
try {
const content = item.json.message?.content || item.json.text || JSON.stringify(item.json);
llmResponse = JSON.parse(content);
} catch (e) {
llmResponse = {
risk_level: 'UNKNOWN',
analysis: item.json.message?.content || 'LLM response parsing failed',
recommendation: 'Manual review required',
urgency: 'Next Quarter'
};
}
results.push({ json: { ...item.json, llm_analysis: llmResponse } });
}
return results;- Operation: Insert
- Table: drift_analyses
- role_id: {{$json.role_id}}
- analysis_date: {{$json.analysis_date}}
- current_salary: {{$json.current_base_salary}}
- market_p50: {{$json.market_p50}}
- drift_percentage: {{$json.drift_percentage}}
- drift_severity: {{$json.drift_severity}}
- target_percentile: 'p50'
- recommendation: {{$json.llm_analysis.recommendation}}
- llm_reasoning: {{JSON.stringify($json.llm_analysis)}}
NODE 7 — POSTGRES: Store Drift Analysis — Insert enriched drift records into the drift_analyses table using the columns and expressions listed above.
- Condition: Any items with drift_severity === 'CRITICAL'
- True → Node 9A (Immediate Alert)
- False → Node 9B (Weekly Summary Only)
NODE 8 — IF: Check Severity — Routes critical alerts immediately; all other flagged roles proceed to weekly summary aggregation.
NODE 9A — HTTP REQUEST: Send Critical Alert to Teams/Slack — Constructs an adaptive card (Teams) or Slack blocks payload containing role details and the LLM-generated recommendation, then POSTs to the configured webhook URL.
NODE 9B — AGGREGATE: Compile Weekly Summary — Collects all flagged roles into a single aggregated payload for downstream email reporting.
- To: hr-leadership@client.com
- Subject: Weekly Compensation Drift Report — {{$json.analysis_date}}
- Body: HTML table of all flagged roles with severity, drift %, and recommendations
NODE 10 — SEND EMAIL: Weekly Drift Report — Sends the compiled HTML summary table to HR leadership on the configured distribution list.
- Insert into agent_runs table
- run_type: 'weekly_analysis'
- Fields: roles_analyzed, alerts_generated, status = 'completed'
NODE 11 — POSTGRES: Log Agent Run — Records a completion audit entry in agent_runs capturing run type, volume metrics, and final status.
Weekly Drift Report Email Template
Type: prompt
HTML email template for the weekly compensation drift summary report sent to HR leadership. Includes a color-coded summary table, trend indicators, and actionable recommendations from the LLM analysis.
Implementation:
EMAIL SUBJECT: ⚠️ Weekly Compensation Drift Report — {{analysis_date}} | {{total_flagged}} Roles Below Market<!DOCTYPE html>
<html>
<head><style>
body { font-family: 'Segoe UI', Arial, sans-serif; color: #333; max-width: 800px; margin: 0 auto; }
.header { background: #1a365d; color: white; padding: 20px; border-radius: 8px 8px 0 0; }
.header h1 { margin: 0; font-size: 20px; }
.header p { margin: 5px 0 0; opacity: 0.8; font-size: 14px; }
.summary-cards { display: flex; gap: 12px; padding: 16px; background: #f7fafc; }
.card { flex: 1; background: white; border-radius: 8px; padding: 16px; text-align: center; box-shadow: 0 1px 3px rgba(0,0,0,0.1); }
.card .number { font-size: 32px; font-weight: bold; }
.card .label { font-size: 12px; color: #718096; text-transform: uppercase; }
.critical .number { color: #e53e3e; }
.warning .number { color: #dd6b20; }
.watch .number { color: #d69e2e; }
table { width: 100%; border-collapse: collapse; margin: 16px 0; }
th { background: #edf2f7; padding: 10px 12px; text-align: left; font-size: 12px; text-transform: uppercase; color: #4a5568; }
td { padding: 10px 12px; border-bottom: 1px solid #e2e8f0; font-size: 14px; }
.severity-critical { background: #fff5f5; color: #c53030; font-weight: bold; }
.severity-warning { background: #fffaf0; color: #c05621; font-weight: bold; }
.trend-worsening { color: #e53e3e; }
.trend-improving { color: #38a169; }
.trend-stable { color: #718096; }
.recommendation { background: #ebf8ff; border-left: 4px solid #3182ce; padding: 12px; margin: 8px 0; border-radius: 0 4px 4px 0; font-size: 13px; }
.footer { padding: 16px; font-size: 12px; color: #a0aec0; border-top: 1px solid #e2e8f0; }
</style></head>
<body>
<div class="header">
<h1>📊 Compensation Drift Report</h1>
<p>Week of {{analysis_date}} | Generated by Compensation Monitor Agent</p>
</div>
<div class="summary-cards">
<div class="card critical"><div class="number">{{critical_count}}</div><div class="label">Critical (≤-15%)</div></div>
<div class="card warning"><div class="number">{{warning_count}}</div><div class="label">Warning (-10 to -15%)</div></div>
<div class="card watch"><div class="number">{{watch_count}}</div><div class="label">Watch (-5 to -10%)</div></div>
</div>
<table>
<thead><tr>
<th>Role</th><th>Department</th><th>Location</th><th>Current Salary</th><th>Market P50</th><th>Drift</th><th>Trend</th><th>Urgency</th>
</tr></thead>
<tbody>
{{#each flagged_roles}}
<tr class="severity-{{lowercase drift_severity}}">
<td><strong>{{job_title}}</strong><br><small>{{normalized_title}}</small></td>
<td>{{department}}</td>
<td>{{location_city}}, {{location_state}}</td>
<td>${{formatNumber current_base_salary}}</td>
<td>${{formatNumber market_p50}}</td>
<td><strong>{{drift_percentage}}%</strong></td>
<td class="trend-{{trend_direction}}">{{trend_icon}} {{trend_direction}}</td>
<td>{{llm_analysis.urgency}}</td>
</tr>
<tr><td colspan="8"><div class="recommendation">💡 {{llm_analysis.recommendation}}</div></td></tr>
{{/each}}
</tbody>
</table>
<div class="footer">
<p>This report was generated automatically by the Compensation Monitoring Agent. Data sources: {{benchmark_source}} (market data), {{hris_source}} (employee data). For questions, contact your MSP account manager.</p>
<p>⚙️ Alert thresholds: Critical ≤ -15% | Warning -10% to -15% | Watch -5% to -10% | Target: {{target_percentile}} percentile</p>
</div>
</body>
</html>In n8n, use a Code node to build the HTML by iterating over flagged roles and replacing template variables. Use n8n's Send Email node with HTML body mode.
Critical Alert Teams Adaptive Card
Type: integration Microsoft Teams Adaptive Card JSON template for immediate critical drift alerts. Sent via Teams webhook when a role is detected at CRITICAL severity (≤-15% below market P50). Includes role details, LLM recommendation, and action buttons for HR to acknowledge or escalate.
Implementation
# send via HTTP POST to Teams webhook URL
{
"@type": "MessageCard",
"@context": "http://schema.org/extensions",
"themeColor": "E53E3E",
"summary": "🚨 Critical Compensation Drift: {{job_title}}",
"sections": [
{
"activityTitle": "🚨 CRITICAL Compensation Drift Detected",
"activitySubtitle": "{{job_title}} — {{department}}",
"activityImage": "",
"facts": [
{ "name": "Role", "value": "{{job_title}} ({{normalized_title}})" },
{ "name": "Location", "value": "{{location_city}}, {{location_state}}" },
{ "name": "Current Salary", "value": "${{current_base_salary}}" },
{ "name": "Market P50", "value": "${{market_p50}}" },
{ "name": "Market P75", "value": "${{market_p75}}" },
{ "name": "Drift", "value": "**{{drift_percentage}}%** below market median" },
{ "name": "Trend", "value": "{{trend_direction}} (prev: {{previous_drift}}%)" },
{ "name": "Risk Level", "value": "{{llm_analysis.risk_level}}" },
{ "name": "Retention Risk", "value": "{{llm_analysis.retention_risk}}" }
],
"markdown": true
},
{
"title": "💡 AI Recommendation",
"text": "{{llm_analysis.recommendation}}\n\n**Suggested Range:** ${{llm_analysis.suggested_salary_range.min}} — ${{llm_analysis.suggested_salary_range.max}}\n\n**Urgency:** {{llm_analysis.urgency}}"
}
],
"potentialAction": [
{
"@type": "OpenUri",
"name": "View Full Report",
"targets": [{ "os": "default", "uri": "https://comp-dashboard.yourmsp.com/drift/{{role_id}}" }]
}
]
}N8N Implementation
Use an HTTP Request node with the following configuration:
- Method: POST
- URL: {{$credentials.teamsWebhookUrl}}
- Body Type: JSON
- Body: Build the above JSON using expressions that reference the current item's fields
- Headers: Content-Type: application/json
Drift Comparison SQL Query
Type: skill
Optimized PostgreSQL query used by the drift analysis agent to fetch the latest compensation snapshot and market benchmark for each monitored role, along with the previous drift analysis for trend detection. This query is the data foundation for all drift calculations.
Implementation:
-- Used by: Comp Agent — Drift Analyzer (Node 2)
-- Core Drift Comparison Query
-- Used by: Comp Agent — Drift Analyzer (Node 2)
-- Performance: Executes in <100ms for up to 1,000 roles with proper indexing
WITH latest_comp AS (
SELECT DISTINCT ON (role_id)
role_id,
current_base_salary,
snapshot_date
FROM compensation_snapshots
ORDER BY role_id, snapshot_date DESC
),
latest_benchmark AS (
SELECT DISTINCT ON (role_id)
role_id,
market_p25,
market_p50,
market_p75,
market_p90,
data_source,
geography,
sample_size,
benchmark_date
FROM market_benchmarks
ORDER BY role_id, benchmark_date DESC
),
previous_drift AS (
SELECT DISTINCT ON (role_id)
role_id,
drift_percentage,
drift_severity,
analysis_date AS previous_analysis_date
FROM drift_analyses
ORDER BY role_id, analysis_date DESC
)
SELECT
r.id AS role_id,
r.job_title,
COALESCE(rm.benchmark_title, r.normalized_title, r.job_title) AS normalized_title,
r.department,
r.location_city,
r.location_state,
r.experience_level,
lc.current_base_salary,
lc.snapshot_date,
lb.market_p25,
lb.market_p50,
lb.market_p75,
lb.market_p90,
lb.data_source AS benchmark_source,
lb.sample_size,
lb.benchmark_date,
pd.drift_percentage AS previous_drift,
pd.drift_severity AS previous_severity,
pd.previous_analysis_date,
-- Staleness check: flag if benchmark data is older than 90 days
CASE WHEN lb.benchmark_date < CURRENT_DATE - INTERVAL '90 days'
THEN true ELSE false END AS stale_benchmark
FROM roles r
INNER JOIN latest_comp lc ON r.id = lc.role_id
INNER JOIN latest_benchmark lb ON r.id = lb.role_id
INNER JOIN role_mappings rm ON r.job_title = rm.internal_title
AND (rm.verified_by_hr = true OR rm.confidence_score >= 0.85)
LEFT JOIN previous_drift pd ON r.id = pd.role_id
WHERE lc.current_base_salary > 0
AND lb.market_p50 > 0
ORDER BY r.department, r.job_title;CREATE INDEX IF NOT EXISTS idx_comp_role_date ON compensation_snapshots(role_id, snapshot_date DESC);
CREATE INDEX IF NOT EXISTS idx_bench_role_date ON market_benchmarks(role_id, benchmark_date DESC);
CREATE INDEX IF NOT EXISTS idx_drift_role_date ON drift_analyses(role_id, analysis_date DESC);
CREATE INDEX IF NOT EXISTS idx_mapping_title ON role_mappings(internal_title) WHERE verified_by_hr = true OR confidence_score >= 0.85;Initial Role Mapping Seed Script
Type: workflow Python script that performs the initial bulk job title normalization for all roles in the client's organization when first setting up the agent. Processes titles in batches through GPT-5.4 mini, stores results in role_mappings table, and generates a review spreadsheet for the client's HR contact to verify mappings.
Implementation
# Bulk job title normalization for initial setup
#!/usr/bin/env python3
"""initial_role_mapping.py — Bulk job title normalization for initial setup.
Usage: python3 initial_role_mapping.py --db-url postgresql://compagent:PASSWORD@localhost:5432/comp_benchmarks --openai-key sk-...
This script:
1. Fetches all unmapped roles from PostgreSQL
2. Sends each title through GPT-5.4 mini for normalization
3. Stores mappings in role_mappings table
4. Exports a CSV for HR review
"""
import argparse
import csv
import json
import time
import sys
from datetime import datetime
import psycopg2
import psycopg2.extras
import requests
SYSTEM_PROMPT = """You are an expert compensation analyst specializing in job title normalization.
Map the given internal job title to a standardized market benchmark title.
Consider the department and location context.
Respond ONLY with valid JSON:
{
"normalized_title": "Standard Market Title",
"soc_code": "XX-XXXX",
"confidence_score": 0.95,
"reasoning": "Brief explanation",
"needs_human_review": false,
"alternative_titles": ["Alt 1", "Alt 2"]
}"""
def normalize_title(title: str, department: str, location: str, api_key: str) -> dict:
"""Call GPT-5.4 mini to normalize a single job title."""
user_msg = f"""Normalize this internal job title to a standard market benchmark title:
Internal Title: {title}
Department: {department}
Location: {location}"""
resp = requests.post(
"https://api.openai.com/v1/chat/completions",
headers={
"Authorization": f"Bearer {api_key}",
"Content-Type": "application/json"
},
json={
"model": "gpt-5.4-mini",
"messages": [
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": user_msg}
],
"temperature": 0.1,
"max_tokens": 300,
"response_format": {"type": "json_object"}
},
timeout=30
)
resp.raise_for_status()
content = resp.json()["choices"][0]["message"]["content"]
return json.loads(content)
def main():
parser = argparse.ArgumentParser()
parser.add_argument("--db-url", required=True)
parser.add_argument("--openai-key", required=True)
parser.add_argument("--output-csv", default=f"role_mappings_review_{datetime.now().strftime('%Y%m%d')}.csv")
parser.add_argument("--batch-size", type=int, default=5)
parser.add_argument("--delay", type=float, default=1.0, help="Seconds between API calls")
args = parser.parse_args()
conn = psycopg2.connect(args.db_url)
cur = conn.cursor(cursor_factory=psycopg2.extras.RealDictCursor)
# Fetch unmapped roles
cur.execute("""
SELECT r.id, r.job_title, r.department, r.location_city, r.location_state
FROM roles r
LEFT JOIN role_mappings rm ON r.job_title = rm.internal_title
WHERE rm.id IS NULL
ORDER BY r.department, r.job_title
""")
roles = cur.fetchall()
print(f"Found {len(roles)} unmapped roles")
results = []
for i, role in enumerate(roles):
location = f"{role['location_city']}, {role['location_state']}"
print(f"[{i+1}/{len(roles)}] Normalizing: {role['job_title']} ({role['department']})")
try:
mapping = normalize_title(
role['job_title'], role['department'], location, args.openai_key
)
# Insert into database
cur.execute("""
INSERT INTO role_mappings
(internal_title, benchmark_title, benchmark_code, confidence_score,
mapping_method, verified_by_hr)
VALUES (%s, %s, %s, %s, %s, %s)
ON CONFLICT (internal_title) DO UPDATE SET
benchmark_title = EXCLUDED.benchmark_title,
benchmark_code = EXCLUDED.benchmark_code,
confidence_score = EXCLUDED.confidence_score,
updated_at = NOW()
""", (
role['job_title'],
mapping['normalized_title'],
mapping.get('soc_code', ''),
mapping.get('confidence_score', 0.5),
'gpt-5.4-mini-initial',
False # Always false until HR reviews
))
results.append({
'internal_title': role['job_title'],
'department': role['department'],
'location': location,
'normalized_title': mapping['normalized_title'],
'soc_code': mapping.get('soc_code', ''),
'confidence': mapping.get('confidence_score', 0),
'needs_review': mapping.get('needs_human_review', True),
'reasoning': mapping.get('reasoning', ''),
'alternatives': ', '.join(mapping.get('alternative_titles', []))
})
print(f" -> {mapping['normalized_title']} (confidence: {mapping.get('confidence_score', '?')})")
except Exception as e:
print(f" ERROR: {e}")
results.append({
'internal_title': role['job_title'],
'department': role['department'],
'location': location,
'normalized_title': 'ERROR — MANUAL MAPPING REQUIRED',
'soc_code': '',
'confidence': 0,
'needs_review': True,
'reasoning': str(e),
'alternatives': ''
})
time.sleep(args.delay) # Rate limiting
conn.commit()
# Export CSV for HR review
with open(args.output_csv, 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=[
'internal_title', 'department', 'location', 'normalized_title',
'soc_code', 'confidence', 'needs_review', 'reasoning', 'alternatives'
])
writer.writeheader()
writer.writerows(results)
print(f"\nCompleted! {len(results)} mappings processed.")
print(f"Review CSV exported to: {args.output_csv}")
print(f"High confidence (>=0.85): {sum(1 for r in results if r['confidence'] >= 0.85)}")
print(f"Needs HR review: {sum(1 for r in results if r['needs_review'])}")
cur.close()
conn.close()
if __name__ == '__main__':
main()Dependencies
pip install psycopg2-binary requestsUsage
python3 initial_role_mapping.py \
--db-url 'postgresql://compagent:PASSWORD@localhost:5432/comp_benchmarks' \
--openai-key 'sk-...' \
--output-csv role_mappings_review.csvSend the exported CSV to the client HR contact for review. After HR approves, mark high-confidence mappings as verified:
psql -U compagent -d comp_benchmarks -c \
"UPDATE role_mappings SET verified_by_hr = true WHERE confidence_score >= 0.85;"Add a UNIQUE constraint on role_mappings.internal_title before running the script: ALTER TABLE role_mappings ADD CONSTRAINT unique_internal_title UNIQUE (internal_title);
Testing & Validation
- HRIS Data Pull Validation: Manually trigger the 'Comp Agent — HRIS Data Pull' workflow and verify that (1) the correct number of employees appears in the roles table matching the client's headcount, (2) compensation_snapshots has a record for each role with today's date, (3) salary values match a spot-check of 5 randomly selected employees against the HRIS source system. Expected: 100% match on salary values, <5% of employees skipped due to missing data.
- Job Title Normalization Accuracy Test: Run the initial_role_mapping.py script on all client roles and review the output CSV. Verify that (1) at least 70% of mappings have confidence_score >= 0.85, (2) no more than 10% of mappings are flagged as needs_human_review = true with confidence below 0.5, (3) spot-check 10 mappings with the client's HR contact and confirm at least 8/10 are correct or acceptable. Document any systematic errors (e.g., all 'Manager' titles being mapped incorrectly).
- Market Benchmark Data Retrieval: Manually trigger the 'Comp Agent — Market Benchmarks' workflow and verify that (1) market_benchmarks table has records for all mapped roles, (2) P25 < P50 < P75 < P90 for all records (logical ordering), (3) benchmark values are within reasonable ranges for the role and location (e.g., a Software Engineer in San Francisco should show P50 of $140K–$180K, not $50K), (4) the data_source field correctly identifies the compensation provider.
- Drift Calculation Accuracy Test: Insert known test data into compensation_snapshots (salary: $100,000) and market_benchmarks (P50: $120,000) for a test role. Run the Drift Analyzer workflow and verify that (1) drift_percentage is calculated as -16.67%, (2) drift_severity is classified as 'CRITICAL', (3) the LLM generates a meaningful recommendation (not generic filler), (4) the drift_analyses record is properly stored in PostgreSQL with all fields populated.
- Alert Delivery End-to-End Test: Create a test role with CRITICAL drift and run the full Drift Analyzer workflow. Verify that (1) a Microsoft Teams message appears in the designated channel within 60 seconds, (2) the Teams card displays correct role name, salary, market P50, drift percentage, and LLM recommendation, (3) a weekly summary email is received at the configured address with the correct HTML formatting and color-coded severity table, (4) the alert_sent and alert_sent_at fields are updated in the drift_analyses table.
- Trend Detection Test: Run the Drift Analyzer two consecutive weeks. In week 2, reduce the test role's salary in compensation_snapshots (simulating no raise while market moves up). Verify that (1) the trend_direction shows 'worsening', (2) the LLM analysis references the worsening trend in its recommendation, (3) the weekly report shows the correct previous drift alongside current drift.
- Database Backup and Restore Test: Run the backup.sh script manually, then drop a non-critical test table (CREATE TABLE test_backup_verify (id int); DROP TABLE test_backup_verify;). Restore from the backup dump and verify the table is recovered. Then verify that all drift_analyses and role_mappings data is intact after restore. Document the restore procedure and time.
- Security and Access Control Test: (1) Attempt to access the n8n UI without credentials — verify HTTP 401 is returned. (2) Attempt to connect to PostgreSQL from outside the VM — verify connection refused. (3) Verify all API keys stored in n8n are encrypted (check n8n credentials store). (4) Verify the retool_readonly database user cannot INSERT, UPDATE, or DELETE any records. (5) Verify SSL certificate is valid on the Nginx reverse proxy (curl -v https://comp-agent.yourmsp.com).
- Load and Scale Test: Import 500 synthetic roles (using a script to generate realistic titles across departments) and run the full pipeline: HRIS Pull → Benchmarks → Drift Analysis. Measure (1) total execution time (target: <30 minutes for 500 roles), (2) LLM API cost for the run (target: <$5), (3) no timeout errors or rate limit failures, (4) all 500 roles have drift_analyses records. This validates the system can handle mid-market clients.
- Compliance Audit Trail Test: Verify that (1) every drift_analyses record has a populated llm_reasoning field containing the full LLM response, (2) every agent_runs record logs the run date, roles analyzed, alerts generated, and status, (3) no compensation data has been deleted (run SELECT COUNT(*) FROM compensation_snapshots grouped by month to verify retention), (4) the role_mappings table tracks who verified each mapping and when (verified_by_hr + updated_at fields).
Client Handoff
The client handoff session should be a 90-minute meeting with the client's HR leadership and designated compensation stakeholders, covering the following topics:
All agent recommendations require human review before any action is taken. The agent does not automatically change salaries and cannot account for individual performance, equity factors, or compensation elements not present in the HRIS.
Documentation to Leave Behind
- Weekly Report Interpretation Guide (1-page reference card)
- Alert Response Playbook (3-5 pages)
- Job Title Mapping Review SOP with screenshots
- Architecture diagram with data flow
- MSP contact information and SLA terms
- Disaster recovery procedure (who to call if the agent stops running)
- List of all monitored roles with their current mappings and thresholds
Success Criteria to Review Together
Maintenance
Weekly Monitoring (15 min/week)
- Verify Monday morning workflow executions completed successfully via n8n execution log
- Check agent_runs table for any status != 'completed' and investigate errors
- Monitor OpenAI API spend via dashboard (should be $15–50/month; investigate if >$75)
- Review Docker container health: docker compose ps (all containers should show 'Up')
- Spot-check 1 random drift analysis for reasonableness
Monthly Maintenance (1-2 hours/month)
- Review new unmapped roles (roles added since last month) and ensure title normalization ran
- Send monthly role_mappings review CSV to client HR contact for verification of new mappings
- Check compensation data provider API for any deprecation notices or schema changes
- Review and rotate n8n admin password quarterly (or use SSO if available)
- Verify automated backups ran successfully every day (check backup.log file)
- Update n8n Docker image: docker compose pull && docker compose up -d (test in staging first)
- Review SSL certificate expiration (certbot handles auto-renewal, but verify)
Quarterly Maintenance (2-4 hours/quarter)
- Recalibrate drift thresholds with client HR — has their compensation philosophy changed?
- Review drift trend data: are specific departments or locations consistently below market?
- Generate a quarterly compensation position report for the client's executive team
- Update compensation data subscription if the provider releases new data cuts or features
- Review and update system prompts if LLM recommendations are becoming generic or unhelpful
- Security review: verify all API keys are current, rotate any that are >6 months old
- Conduct a backup restore test to verify recovery procedure works
Annual Maintenance (4-8 hours/year)
- Major version upgrade review for n8n, PostgreSQL, and Docker
- Review compliance requirements: check for new pay transparency laws in client's jurisdictions
- Renew compensation data provider subscription and negotiate pricing
- Full system performance review: analyze agent_runs table for execution time trends
- Archive old drift_analyses data (older than current retention policy) to cold storage if needed
- Re-run the full initial_role_mapping.py if client has undergone significant org restructuring
Trigger-Based Maintenance
- HRIS system change (e.g., client migrates from BambooHR to Rippling): Reconfigure Unified.to connection, test data pull, verify field mapping. Estimated 4-8 hours.
- Compensation data provider API change: Update API endpoints, response parsing, and test. Estimated 2-4 hours.
- OpenAI model deprecation: Update model parameter in n8n workflows and test output quality. Estimated 1-2 hours.
- New state pay transparency law affecting client: Update compliance documentation and potentially add location-specific thresholds. Estimated 2-3 hours.
SLA Recommendations
- Response time for agent outage (not running on Monday): 4 business hours
- Response time for data accuracy issue reported by client: 1 business day
- Response time for critical security issue: 2 hours
- Scheduled maintenance window: Saturdays 6-10 AM (does not affect Monday runs)
- Uptime target: 99% (allows ~3.6 days downtime/year, reasonable for a weekly batch system)
Escalation Path
Alternatives
Fully Manual Spreadsheet-Based Benchmarking
Instead of an autonomous AI agent, the MSP provides a quarterly manual service where a compensation analyst exports HRIS data and CompAnalyst benchmarks into a pre-built Excel template. The template uses formulas to calculate drift and generate a static report. No AI, no automation, no ongoing cloud infrastructure.
Tradeoffs
- Cost: Much lower — no cloud VM, no LLM API costs, no software subscriptions beyond CompAnalyst (~$3,850/year). One-time template setup of $3,000–5,000 plus $1,500–3,000/quarter for manual analysis.
- Complexity: Very low — any MSP can deliver this without development skills.
- Capability: Significantly weaker — quarterly cadence means drift can go undetected for months; no real-time alerts; no trend tracking; no AI-generated recommendations; heavy manual labor per cycle.
- Recommend for: Very small clients (<50 employees) or as a Phase 0 pilot before investing in automation.
CrewAI Open-Source Multi-Agent Framework (Instead of n8n)
Replace n8n with CrewAI's open-source Python framework for agent orchestration. Build dedicated agents for each task: a Data Collection Agent (pulls HRIS data), a Benchmark Agent (fetches market data), an Analysis Agent (calculates drift and generates insights), and a Reporting Agent (formats and sends alerts). CrewAI's multi-agent architecture allows agents to collaborate and delegate tasks.
- Tradeoffs — Cost: Similar operating cost since CrewAI open-source is free, but higher development cost (estimated 40-60% more development hours than n8n).
- Tradeoffs — Complexity: Higher — requires a Python developer comfortable with the CrewAI framework, custom code for all integrations (no visual drag-and-drop), and more DevOps for deployment and monitoring.
- Tradeoffs — Capability: Potentially stronger — multi-agent architecture can handle more complex reasoning chains, tool use, and autonomous decision-making. CrewAI agents can self-correct and retry failed tasks.
- Recommend for: MSPs with strong Python development teams who want maximum flexibility, or for enterprise clients with complex multi-entity compensation structures that benefit from agentic reasoning.
Lattice or HiBob Native Compensation Module
Instead of building a custom agent, recommend the client adopt Lattice Compensation ($6/seat/month) or HiBob's compensation module with Mercer Comptryx integration. These HRIS-native tools provide built-in benchmarking dashboards and some alerting without custom development.
Tradeoffs
- Cost: Lower total cost for small organizations — Lattice at $6/seat/month for 100 employees is $7,200/year with no implementation cost. HiBob similarly bundles benchmarking into its HRIS subscription.
- Complexity: Very low — no custom development, no MSP infrastructure to manage.
- Capability: Limited compared to the custom agent — no autonomous monitoring (user must log in to check), no customizable AI analysis, no staffing-agency-specific features (Bullhorn integration), and limited alerting flexibility. The MSP loses the managed service revenue opportunity.
- Recommend for: Clients who already use Lattice or HiBob and want a quick win, or clients who are not willing to invest in a custom solution. The MSP can offer to layer the autonomous agent on top of these platforms later as an upsell.
Make.com Low-Code Automation (Instead of n8n)
Use Make.com (formerly Integromat) as the workflow automation platform instead of self-hosted n8n. Make.com provides a visual workflow builder with 1,500+ pre-built integrations, including native BambooHR, ADP, Slack, Teams, and OpenAI connectors. No self-hosting required.
Tradeoffs
- Cost: Slightly higher ongoing cost — Make.com Pro plan at $9/month for 10,000 operations, or Teams plan at $16/month for more. No infrastructure management cost.
- Complexity: Lower — no Docker, no VM management, no Nginx configuration. The visual builder is more intuitive for non-developers.
- Capability: Comparable for this use case, but less control over data residency (Make.com is cloud-only), and compensation data flows through Make.com's servers (potential compliance concern). Rate limiting and execution time limits may affect large client implementations.
Recommend for: MSPs without DevOps capabilities who want a simpler deployment, or for initial pilot phases where speed-to-value is prioritized over long-term control. Migrate to n8n self-hosted once the solution is proven.
Azure OpenAI with Full Microsoft Stack
Replace the open-source stack with a fully Microsoft-native implementation: Azure OpenAI for LLM, Azure Logic Apps for workflow orchestration, Azure SQL Database for storage, Power BI for dashboards, and Power Automate for alerting. Integrates natively with Microsoft 365 and Teams.
Tradeoffs
- Cost: Higher — Azure Logic Apps consumption plan costs ~$0.000025/action (adds up for complex workflows), Azure SQL starts at ~$5/month but scales quickly, Power BI Pro is $10/user/month. Total likely $200–500/month vs. $100–200 for the open-source stack.
- Complexity: Lower for MSPs already deep in the Microsoft ecosystem — no Docker, no Linux administration, all managed services. Higher for MSPs unfamiliar with Azure Logic Apps' visual designer.
- Capability: Strong enterprise compliance story — SOC 2, HIPAA, FedRAMP, data residency guarantees. Native Teams integration is seamless. However, Logic Apps is less flexible than n8n for complex AI agent patterns, and debugging is harder.
- Recommend for: Enterprise clients with strict Microsoft-only policies, government or highly regulated clients requiring FedRAMP compliance, or MSPs whose core competency is the Microsoft stack rather than open-source tooling.
Want early access to the full toolkit?