I was reading SQL antipatterns and got into the chapter about recursive queries and how there are different ways to approach this. I wanted to experiment with this and found that postgresql supports recursive queries. I decided to make a simple commenting database and see how it would work.
Fibonnacci with CTEs
Adopted from https://wiki.postgresql.org/wiki/Fibonacci_Numbers
WITH RECURSIVE fib(a, b) AS ( -- non recursive part -- values (0, 1) UNION ALL -- recursive part -- select b, a + b from fib ) SELECT a from fib limit 20;
Here's how this would work:
There are three tables, intermediate, working and result table. The non-recursive part is first calculated, setting the working and result table to values (0, 1)
The recursive part of the query select b, a + b from fib is then run on. fib here is the working table, which contains (0, 1). This sets the intermediate table to (1, 1). This gets appended to the result table, and replaces the working table.
In the next run, the working table is (1, 1), resulting in (1, 2) in the intermediate table, which gets appended to the result table, and replaces the working table. This goes on and on.
The recursion should stop when the intermediate table is empty but since fibonacci is infinite this does not happen. Adding a where clause to the recursive part would cause this to occur. For example:
WITH RECURSIVE fib(a, b) AS ( -- non recursive part -- values (0, 1) UNION ALL -- recursive part -- select b, a + b from fib where b < 100 ) SELECT a from fib;
The last select statement: SELECT a from fib limit 20; picks the information from the result table. Limiting this too can cause the infinite recursion to stop.
Detailed explanation of how this works can be found here: CTE Read me
To introduce a cycle in this query we just have to do:
UPDATE comments SET parent_id=8 WHERE comment_id=1;
To prevent this while running our query, we have to keep a state of all the parents we've visited and filter these out in the recursive bit. In this case, we maintain an array of visited parents and ignore all children comments that have that id.
WITH RECURSIVE child_comments AS ( SELECT *, array[comment_id] as visited_parents FROM comments WHERE comment_id = 1 UNION ALL SELECT c.*, cc.visited_parents || c.comment_id as visited_parents FROM comments c INNER JOIN child_comments cc ON c.parent_id = cc.comment_id WHERE NOT c.comment_id = ANY (cc.visited_parents) ) SELECT * FROM child_comments LIMIT 10;
But how do we prevent cycle creation in the query itself? One method is to have a trigger that gets all parents of a child comment, and doesn't update if the update would cause a cycle.
CREATE OR REPLACE FUNCTION cycle_prevention() RETURNS trigger AS $cycle_prevention$ DECLARE parents_not_allowed int; BEGIN -- Check that parent id doesn't cause a cycle IF NEW.parent_id IS NOT NULL THEN raise notice 'parent id: %, comment_id %', NEW.parent_id, NEW.comment_id; WITH RECURSIVE parents AS ( SELECT parent_id from comments where comment_id = NEW.parent_id UNION ALL select c.parent_id from comments c INNER JOIN parents p on p.parent_id=c.comment_id ) SELECT ARRAY(SELECT parent_id::int FROM parents LIMIT 10) INTO parents_not_allowed; raise notice 'Value: %', parents_not_allowed; IF NEW.comment_id = ANY(parents_not_allowed) THEN RAISE EXCEPTION 'cycle found in query'; END IF; END IF; RETURN NEW; END; $cycle_prevention$ LANGUAGE plpgsql; CREATE TRIGGER cycle_prevention BEFORE INSERT OR UPDATE ON comments FOR EACH ROW EXECUTE PROCEDURE cycle_prevention();
Now the update fails with ERROR: cycle found in query.