of Ding.

In search of better PeopleSoft effective date queries, Part 2

In my previous post, I made the argument that this:

SELECT ps_acad_prog.*
FROM ps_acad_prog
SELECT ps_acad_prog.emplid, ps_acad_prog.acad_career, ps_acad_prog.stdnt_car_nbr, ps_acad_prog.effdt, max(ps_acad_prog.effseq) as effseq
FROM ps_acad_prog
SELECT emplid, acad_career, stdnt_car_nbr, max(effdt) as effdt
FROM ps_acad_prog
WHERE effdt <= sysdate
GROUP BY emplid, acad_career, stdnt_car_nbr
) max_effdt_ps_acad_prog
ON ps_acad_prog.emplid = max_effdt_ps_acad_prog.emplid
AND ps_acad_prog.acad_career = max_effdt_ps_acad_prog.acad_career
AND ps_acad_prog.stdnt_car_nbr = max_effdt_ps_acad_prog.stdnt_car_nbr
AND ps_acad_prog.effdt = max_effdt_ps_acad_prog.effdt
GROUP BY ps_acad_prog.emplid, ps_acad_prog.acad_career, ps_acad_prog.stdnt_car_nbr, ps_acad_prog.effdt
) effdt_and_effseq_acad_prog
ON ps_acad_prog.emplid = effdt_and_effseq_acad_prog.emplid
AND ps_acad_prog.acad_career = effdt_and_effseq_acad_prog.acad_career
AND ps_acad_prog.stdnt_car_nbr = effdt_and_effseq_acad_prog.stdnt_car_nbr
AND ps_acad_prog.effdt = effdt_and_effseq_acad_prog.effdt
AND ps_acad_prog.effseq = effdt_and_effseq_acad_prog.effseq

is better than this:

FROM ps_acad_prog p
WHERE effdt = (
SELECT max(effdt)
FROM ps_acad_prog
WHERE emplid = p.emplid
AND acad_career = p.acad_career
AND stdnt_car_nbr = p.stdnt_car_nbr
AND effdt <= sysdate
AND effseq = (
SELECT max(effseq)
FROM ps_acad_prog
WHERE emplid = p.emplid
AND acad_career = p.acad_career
AND stdnt_car_nbr = p.stdnt_car_nbr
AND effdt = p.effdt

But that first query is way bigger. Other than a speed boost, it may not clear why I think the first query is better, so I thought I’d dig into it a bit more. I’d like to explain more about why I think this is better (not just faster) and offer up some alternative notation.


I think the biggest reason to avoid correlated subqueries is not the performance benefit, it’s the cognitive load that it forces upon us. A big, gnarly query with many correlated subqueries makes it difficult to reason about.

When you encounter a big query that’s doing more than you can take in at once, what’s the first thing you do? You break it down. Start executing pieces of the query to get an idea of the relations that are at play in your mind. Correlated subqueries make that difficult because you cannot select a piece of the query and run it individually. Especially in the case of PeopleSoft, these subqueries are wired to other relations outside of the immediate scope.

Effective date subqueries also often get thrown into the WHERE clause alongside conditions that represent real business logic, clouding the intention of the query. The goal should be to shove all of this accidental complexity aside, leaving the minimal amount of code to represent the core of what the query is trying to accomplish.

The following query speaks for itself, doesn’t it? This should be our goal. Any effective dated nonsense that this requires should take a back seat to this distilled statement.

SELECT * FROM effective_acad_prog_tbl WHERE acad_prog='123ASDF'

Destructuring with WITH

Since we’re specifically talking PeopleSoft/Oracle here, we have the often neglected WITH clause at our disposal. We can use WITH to destructure the ugly peripheral bits into meaningful relation names.

In Clojure and other lisps, destructuring is done with the let form:

(let [acad-prog-eff-keys (...)
effective-acad-prog (...)]
(do-something effective-acad-prog))

Oracle’s WITH clause is very similar:

WITH acad_prog_tbl_eff_keys AS (
SELECT institution, acad_prog, max(effdt) as effdt
FROM ps_acad_prog_tbl
WHERE effdt <= sysdate
GROUP BY institution, acad_prog
effective_acad_prog_tbl AS (
SELECT ps_acad_prog_tbl.*
FROM ps_acad_prog_tbl
JOIN acad_prog_tbl_eff_keys
ON ps_acad_prog_tbl.institution = ps_acad_prog_tbl_eff_keys.institution
AND ps_acad_prog_tbl.acad_prog = ps_acad_prog_tbl_eff_keys.acad_prog
AND ps_acad_prog_tbl.effdt = ps_acad_prog_tbl_eff_keys.effdt
SELECT * FROM effective_acad_prog_tbl WHERE acad_prog = '123ASDF'

WITH lets us nicely break up (and push aside) the non-core queries and clauses, leaving the business logic to speak for itself.


So my point here is really composability. Regardless of whether you’re using WITH clauses, building queries by composing discrete meaningfully-named relations will help build clearer, faster, and more maintainable queries.

comments powered by Disqus