Strange data type transformations
When your function argument types are loosely changed
This article results from a code review I did for a customer.
Our customer created a pg_dump --schema-only
of the target database to provide
me with the plpgsql code and database object structures to review. So far
so good.
I started to read the code and then became puzzled. The code looks like this:
CREATE FUNCTION xxx( p_id character, p_info character varying )
RETURNS integer
LANGUAGE plpgsql
AS $$
DECLARE
BEGIN
...
INSERT INTO t1
SELECT * FROM t2 WHERE t2.id = p_id;
...
END;
$$
;
Maybe you saw nothing wrong with the function. Perhaps knowing the table
definition will help:
CREATE TABLE t2 (
id VARCHAR(130) NOT NULL
...
PRIMARY KEY (id)
);
t2.id is always 130 characters long (in practice) and there are 400 million tuples.
So as you may have guessed, it seems odd to have the p_id CHARACTER matching id VARCHAR(130).
Moreover CHARACTER is the same as CHAR(1).
Our customer had not seen any issues with the code for years. Nevertheless, our customer told me that the function definition he wrote was not like that: it was meant to be p_id CHARACTER(130) - not CHARACTER.
So what went wrong? Let’s test around because it’s fun.
CREATE FUNCTION test( c character, d character varying )
RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
RAISE NOTICE 'c=%, d=%', c,d;
END;
$$;
SELECT test( '123465789', '987654321' );
NOTICE: c=123465789, d=987654321
test
------
(1 row)
We have an interesting result here: no casting to CHAR(1) has been done.
Let’s see more details:
EXPLAIN (COSTS OFF,ANALYZE,VERBOSE)
SELECT test( '123465789', '987654321' );
NOTICE: c=123465789, d=987654321
QUERY PLAN
---------------------------------------------------------------------
Result (actual time=0.040..0.041 rows=1 loops=1)
Output: test('123465789'::bpchar, '987654321'::character varying)
Planning Time: 0.023 ms
Execution Time: 0.053 ms
(4 rows)
We can see there was a cast to BPCHAR. As a reminder, BPCHAR is an alias of CHARACTER
and it can represent a string up to 10,485,760 characters.
Now let’s make another test:
CREATE FUNCTION test(c character(4))
RETURNS character(4)
LANGUAGE sql
AS $$
select c;
$$;
As you can see, the language changed to SQL and the argument type and the return
type are CHAR(4). How does it execute?
SELECT test('123456789');
test
-----------
123456789
(1 row)
EXPLAIN VERBOSE SELECT test('123456789');
QUERY PLAN
-------------------------------------------
Result (cost=0.00..0.01 rows=1 width=32)
Output: '123456789'::bpchar
(2 rows)
As you can see, even though you expect to process CHAR(4) data, you end up processing arbitrary length strings instead!!
However, do not rush to PostgreSQL mailing list to complain YET!
As a matter of fact, this behaviour is not a bug. The documentation states:
“The full SQL type syntax is allowed for declaring a function’s arguments and return value. However, parenthesized type modifiers (e.g., the precision field for type numeric) are discarded by CREATE FUNCTION. Thus for example CREATE FUNCTION foo (varchar(10)) … is exactly the same as CREATE FUNCTION foo (varchar) …”
This explains that CHARACTER(x) became CHARACTER aliased as BPCHAR. And as we saw, BPCHAR is not actually CHAR(1) but more like VARCHAR(10485760). This fully explains the behaviour.
Wait, wait , WAIT ! The original intention was to deal with CHAR(4) string - not any arbituary length strings.
Isn’t there any hope? No, sorry… (kidding.)
Reading the same documentation page, we see that “argtype” and “rettype” can be base, composite, or domain types, or can reference the type of a table column.
The trick is to create either a composite type or a domain to use as argtype or rettype.
Here are some examples:
-- Works in simple case trick
SELECT test( '12345789'::char(4) );
-- Domain trick
CREATE DOMAIN c4 AS char(4);
CREATE FUNCTION test(param c4)
RETURNS c4
AS $$
BEGIN
RAISE NOTICE 'param=%', param;
RETURN param;
END;
$$ LANGUAGE plpgsql;
SELECT test( '123456789' );
ERROR: value too long for type character(4)
SELECT test( '123456789'::char(4) );
NOTICE: param=1234
test
------
1234
(1 row)
SELECT test( '123456789'::c4);
NOTICE: param=1234
test
------
1234
(1 row)
SELECT pg_typeof( test( '123456789'::char(4) ) );
NOTICE: param=1234
pg_typeof
-----------
c4
(1 row)
Now you should be happy with the result.
What? Not yet? Ok here is an additional trick.
-- Map a table structure
CREATE TABLE qq ( c char(4));
CREATE FUNCTION test(IN c qq, OUT d qq)
LANGUAGE sql
AS $$
SELECT c;
$$;
SELECT * FROM test(ROW('12345'));
ERROR: value too long for type character(4)
SELECT * from test(ROW('1234'));
c
------
1234
Hmm, OK, but how is this is different from the domain trick?
-- Easy Type Alteration
ALTER TABLE qq ALTER c TYPE char(5);
SELECT * FROM test( ROW('12345') );
c
-------
12345
Try to ALTER a domain - you will see how (not) easy it is.
The table definition trick allows for some flexibility as follows:
ALTER TABLE qq ADD ee int;
SELECT test( ROW('12345', 4) );
test
----------
(12345,4)
SELECT * FROM test( ROW('12345', 4) );
c | ee
-------+----
12345 | 4
We hope you enjoyed this article and that you learnt something new and interesting!