PathQL server: safe SQL to JSON

In an earlier post I introduced PathQL: write a SQL query, get nested JSON back, with the structure inferred from table aliases and foreign keys. That post ended with a one-liner: “You may even want to set up a PathQL server.” This is the follow-up about that server, because the moment you turn PathQL into an HTTP endpoint you run into an uncomfortable question.

A PathQL server accepts arbitrary SQL. That is the whole point: the caller sends SQL, the server runs it and shapes the result. But if you put that on a network with a single shared connection and no guards, you have handed the world a SQL console. It turned out that shaping SQL results in nested JSON was the easy part. The hard part was properly limiting the SQL you are allowed to execute.

Make the database the boundary

Years ago I wrote that the right place for API authorization is the database itself. That idea is the foundation here. The application should not be the last line of defense. The database should be, because it is the thing that actually holds the data and it has a mature, battle-tested permission system.

Our pathql-server has two identity models, selected in the config. The default, none, is a single shared connection with no per-caller isolation. It is the development and single-tenant on-ramp: simple, but every authenticated caller runs as the same database role. The hardened model is login_role, where the server connects to PostgreSQL as the caller’s own database role and runs the query inside a read-only transaction on that connection.

This matters because of how row-level security then works. Your RLS policies key on current_user. The role is fixed by authentication, the PostgreSQL role system enforces membership, and no CTE or function can change current_user mid-statement. A caller cannot forge another tenant’s identity even inside a single crafted query. That is an unforgeable isolation boundary, and it lives in the database, not in the Go code.

Each user with id N maps to a login role like pathql_r_N. The per-role connection password is derived as HMAC(password_secret, role), so nothing per-role is stored and the password can be re-derived at connect time. The server never holds CREATEROLE and never runs role DDL itself. Instead GET /admin/roles/sync emits the exact CREATE ROLE and GRANT and DROP ROLE statements needed to make the database match the users table, and a cron job or an operator applies them. The server that answers untrusted queries has no business creating roles.

Defense in depth on top

The database is the boundary, but a good boundary deserves layers in front of it. None of these replace least privilege and RLS; they each reject a class of request before it gets that far.

Authentication is an ordered list of methods: API key (only a SHA-256 hash is stored, looked up by a non-secret prefix), HTTP Basic against a bcrypt hash, and JWT with a configured algorithm allowlist so algorithm-confusion attacks are refused at the edge. A failed attempt returns a generic 401 that never reveals which field was wrong.

The SQL gate is an optional pre-execution validator. With it on, a query is rejected unless it is a single statement, read-only (it must start with SELECT, WITH, TABLE or VALUES), and free of system catalogs (no information_schema, no pg_ names). The check is content-aware, so a pg_ that appears inside a string literal or a comment does not trigger a rejection. This closes off catalog enumeration that RLS does not protect.

The cost ceiling stops expensive queries before they run. With a limit set, the server runs EXPLAIN (FORMAT JSON) first, in the same read-only transaction with the same bound parameters, and rejects the query if the planner’s estimate of cost or rows is over the limit. Plain EXPLAIN never executes the query, so the check has no side effects. It catches a sequential scan over a huge table or an accidental cross join before it ties up a connection.

On top of that there are the usual abuse limits: a per-IP rate limit, global and per-user concurrency caps, a brute-force lockout on repeated auth failures, an optional allow/deny IP firewall in front of every route, a strict CORS policy that never emits a wildcard, double-submit-cookie XSRF protection for cookie-based browser clients, and optional TLS with HSTS. Each of these is off or permissive by default and you turn on what your deployment needs.

Trust, but verify the role

Almost every guarantee above depends on the database role being set up correctly, off-server. So at startup pathql-server runs a hardening self-check using read-only catalog queries and reports what it finds. Is the connected role a superuser, or does it have BYPASSRLS? Can it write to tables outside of auth tables? Are there readable tables with no row-level security at all, which would be a silent full-table exposure? In warn mode it logs the findings and keeps running; in enforce mode it refuses to start on a critical finding. It is the difference between hoping the grants are right and knowing they are.

Writes, carefully

The server is read-only by default: every query runs in a READ ONLY transaction and the role is granted only SELECT. You can opt into writes, but it cannot be combined with the read-only setting, and the server refuses to start on that contradiction rather than silently picking one.

With writes on, each request is classified by its leading keyword. A read still runs read-only exactly as before, a write runs in a read-write transaction, and anything else (DDL, COPY, stacked statements, catalog access) is rejected. The important subtlety is tenant isolation: an RLS USING clause only filters which rows a write can see, not which it can create. Without a WITH CHECK clause a caller could insert a row attributed to another tenant. Under login_role with enforcement on, the server refuses to start when a writable table has no WITH CHECK policy, so that cross-tenant write path is caught at boot rather than in production. A blast-radius cap rolls back any write that touches more rows than configured.

The interface stayed small

Here is what I find satisfying about the result. Despite all of the above, the product is still the one sentence from the first post: send SQL, get nested JSON. A client that does not need hardening sets a DSN and a listen address and is done. Everything else is optional and layered on without changing that core.

In A Philosophy of Software Design, John Ousterhout argues for deep modules: simple interfaces that hide significant complexity. A server that safely runs untrusted SQL is genuinely complex, but the interface it presents did not need to grow to match. The complexity went into the database boundary and the layers in front of it, not into the request a caller has to understand.

Disclaimer: I built this to explore the idea. Test it thoroughly, and review the example RLS policy and grants, before pointing anything untrusted at it.

See: https://github.com/mevdschee/pathql-server

Enjoy!

Make the database the boundary#

Defense in depth on top#

Trust, but verify the role#

Writes, carefully#

The interface stayed small#

Links#

Make the database the boundary

Defense in depth on top

Trust, but verify the role

Writes, carefully

The interface stayed small

Links