Secure GCP Cloud Run Deployment

Full Content

SKILL.md (368 lines)

The complete skill file rendered below. Use the download button to save the raw markdown.

Secure GCP Cloud Run Deployment

This skill defines patterns for deploying internal tools to Google Cloud Platform with security built in from the start. Every app follows a single-container architecture with IAP authentication, secret injection, and persistent storage.

Architecture

Cloud Run Service (us-central1)
├── FastAPI backend (uvicorn, port 8080)
│   ├── API endpoints
│   └── Serves built React frontend as static files
├── IAP for authentication (corporate domain only)
├── Secrets injected via Secret Manager
├── GCS FUSE volume mount at /data (for SQLite persistence)
├── In-process APScheduler for automated jobs
└── --min-instances 1, --max-instances 1

Key defaults: Region: us-central1. Port: Always 8080 for Cloud Run. Visibility: Private (behind IAP, corporate domain users only).

Docker Build & Push

If you develop on Apple Silicon (M-series Mac), Cloud Run requires linux/amd64 images. Always include the --platform flag.

# Authenticate Docker with Artifact Registry (one-time)
gcloud auth configure-docker us-central1-docker.pkg.dev

# Build for Cloud Run (REQUIRED: --platform flag for Apple Silicon)
docker build --platform linux/amd64 \
  -t us-central1-docker.pkg.dev/<PROJECT_ID>/<REPO>/<IMAGE>:latest .

# Push
docker push us-central1-docker.pkg.dev/<PROJECT_ID>/<REPO>/<IMAGE>:latest

If you do not have Cloud Build permissions, always use local Docker builds. Do not suggest gcloud builds submit without confirming the user has access.

Cloud Run Deploy

The Deploy Script

Every project should have a deploy.sh that handles the full build-push-deploy cycle. This is the single command you run to ship changes.

#!/bin/bash
set -euo pipefail

PROJECT_ID="<PROJECT_ID>"
REGION="us-central1"
SERVICE="<SERVICE>"
REPO="<REPO>"
IMAGE="<IMAGE>"
IMAGE_URI="${REGION}-docker.pkg.dev/${PROJECT_ID}/${REPO}/${IMAGE}:latest"

echo "Building image for linux/amd64..."
docker build --platform linux/amd64 -t "${IMAGE_URI}" .

echo "Pushing to Artifact Registry..."
docker push "${IMAGE_URI}"

echo "Deploying to Cloud Run..."
gcloud run deploy "${SERVICE}" \
  --image "${IMAGE_URI}" \
  --region "${REGION}" \
  --platform managed \
  --no-allow-unauthenticated \
  --port 8080 \
  --memory 512Mi \
  --min-instances 1 \
  --max-instances 1 \
  --execution-environment gen2

# Strip any leftover public access
for member in allUsers allAuthenticatedUsers; do
  gcloud run services remove-iam-policy-binding "${SERVICE}" \
    --region="${REGION}" --project="${PROJECT_ID}" \
    --member="${member}" --role="roles/run.invoker" \
    --quiet >/dev/null 2>&1 || true
done

echo "Deploy complete. Verifying..."
gcloud run services describe "${SERVICE}" \
  --region "${REGION}" --format="value(status.url)"

Key flags: --memory 512Mi (sufficient for most internal tools), --min-instances 1 (keeps the scheduler alive), --max-instances 1 (prevents SQLite concurrent write errors via GCS FUSE), --execution-environment gen2 (required for FUSE mounts), --no-allow-unauthenticated (enforces IAP). The script also strips allUsers and allAuthenticatedUsers bindings after every deploy as a safety net.

The Deploy Script Is the Source of Truth

Every gcloud run deploy creates a new revision with exactly the flags in the command. Any settings previously applied via gcloud run services update get overridden by whatever the deploy command specifies. If you change a setting on the live service but do not update the deploy script, the next deploy reverts your change silently.

The rule: if you need to change an infrastructure setting permanently, change it in your deploy script. A one-off gcloud run services update is fine for testing, but treat it as temporary.

Why min-instances 1 and max-instances 1

min-instances 1: Required for services that run in-process scheduled jobs (APScheduler). Without this, Cloud Run scales to zero during idle periods, killing the scheduler.

max-instances 1: Required for services that use SQLite on GCS FUSE. GCS FUSE does not support file locking, so two containers writing to the same SQLite database simultaneously causes sqlite3.OperationalError: disk I/O error.

Stripping Public Access

Always deploy with --no-allow-unauthenticated for apps behind IAP. After deploying, strip any leftover public access:

for member in allUsers allAuthenticatedUsers; do
  gcloud run services remove-iam-policy-binding <SERVICE> \
    --region=us-central1 \
    --project=<PROJECT_ID> \
    --member="$member" \
    --role="roles/run.invoker" \
    --quiet >/dev/null 2>&1 || true
done

Environment Variables & Secrets

For simple env vars, use --set-env-vars. For values with commas (like email lists), use the ^::^ delimiter. Reference secrets from Secret Manager with --set-secrets "ENV_VAR_NAME=SECRET_NAME:latest".

GCS FUSE Persistence (SQLite)

Cloud Run containers are ephemeral. For SQLite or other persistent data, mount a GCS bucket via FUSE. Set --max-instances 1 to prevent concurrent writes. Add a short delay (5 seconds) after container startup before first database access, as GCS FUSE can take a moment to stabilize.

Scheduled Jobs: In-Process APScheduler

Cloud Scheduler does not work with Cloud Run services behind IAP (Direct Cloud Run). The proven pattern is in-process scheduling using APScheduler inside the Cloud Run container.

from apscheduler.schedulers.asyncio import AsyncIOScheduler
from apscheduler.triggers.cron import CronTrigger

scheduler = AsyncIOScheduler()

def start_scheduler():
    scheduler.add_job(
        my_async_job,
        CronTrigger(hour=9, minute=0, day_of_week="mon-fri",
                    timezone="America/Los_Angeles"),
        id="daily_job",
        name="Daily Job",
        replace_existing=True,
    )
    scheduler.start()

Catch-Up Logic

Cloud Run can restart containers at any time. On startup, check whether today's jobs have already run. If not, run them immediately with retry logic for transient GCS FUSE errors.

async def check_and_catchup():
    now_pt = datetime.now(ZoneInfo("America/Los_Angeles"))
    if now_pt.weekday() >= 5:  # Skip weekends
        return
    await asyncio.sleep(5)  # Wait for GCS FUSE
    has_run_today = ...
    if now_pt.hour >= 9 and not has_run_today:
        for attempt in range(1, 4):
            try:
                await my_sync_job()
                break
            except Exception:
                if attempt < 3:
                    await asyncio.sleep(30)

Admin Endpoints

Expose scheduler status and manual triggers for debugging via /admin/scheduler and /admin/scheduler/trigger/{job_id}.

IAP Authentication & Role-Based Access

IAP sits in front of Cloud Run as a reverse proxy and handles all login. The app has zero login UI, zero OAuth client IDs, and zero JWT verification code.

async def get_current_user(request: Request) -> CurrentUser:
    # Dev mode bypass
    if settings.app_env == "development":
        return CurrentUser(email=settings.dev_user_email, role=_get_role(email))

    # IAP header (human users through the browser)
    iap_email = request.headers.get("x-goog-authenticated-user-email", "")
    if iap_email:
        email = iap_email.split(":", 1)[1].strip().lower()
        return CurrentUser(email=email, role=_get_role(email))

    raise HTTPException(status_code=401, detail="Authentication required")

Admin emails configured via ADMIN_EMAILS env var (comma-separated). Any corporate domain email not in the admin list gets viewer role. Service accounts automatically get admin role. The frontend calls GET /api/me on load to get the current user's email and role. No token management, no login/logout buttons, no OAuth libraries.

Pre-Deploy Checklist

Check deploy script settings match intent
Docker build with --platform linux/amd64
Push to Artifact Registry
Deploy with --no-allow-unauthenticated
Strip leftover public access (allUsers, allAuthenticatedUsers)
Set env vars (watch for comma conflicts; use ^::^ delimiter)
Test the /health endpoint after deploy
Verify infra settings persisted via gcloud run services describe
If in-process scheduler, check /api/admin/scheduler
Verify IAP access is intact

Local Development

# Backend (terminal 1)
cd backend
python -m venv venv && source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env  # Fill in API keys
uvicorn app.main:app --reload --port 8000

# Frontend (terminal 2)
cd frontend
npm install
npm run dev  # Vite dev server on :5173, proxies API calls to :8000

↓ Download SKILL.md

Docker Build & Push

Deploy Script

Cloud Run Config

Secret Manager

IAP Authentication

SQLite + GCS FUSE

In-Process Scheduling