One of the first issues when starting to use PrestoSQL distributed query engine is related to missing columns of certain data types, especially numeric and all variants of date.

This issue is usually because of missing precision at the data source, which is not only one of the most common, but also one of the worst mistakes developers and database designers can make.

In this post I’ll demonstrate the issue and show two possible solutions.

Let’s start with creating a table in PostgresSQL database (similar is with other databases, only data type is vendor specific) and loading a few records.

 

create table sdc.ttest
(
id numeric,
text VARCHAR(100)
);


insert into sdc.ttest values (1, 'one');
insert into sdc.ttest values (2, 'two');
commit;

 

Note that I’ve created the column “id” as NUMERIC data type, without specifying precision.

Here is how the same table will look like from the PrestoSQL.

 

 

Column “id” is missing, and only “text” column which is defined as varchar(100) is visible.

There are two possible solutions for this problem.

 

1. Solution #1

 

alter table ttest alter column id type numeric(5);

 

With the DDL command I’ve fixed the issue / bug introduced by developers and database designers, as you can see on the next picture.

 

 

Although this solution will solve two issues in one shot, it is often not applicable in practice because of production interruption due to the table level lock.

 

 

2. Solution #2

 

Second solution to this problem is based on creating a VIEW, where you can fix all the issues related to wrong/imprecise by using the CAST as in the following example:

 

create view ttest_vw
as select
cast(id as integer) as id,
text
from
ttest;

 

After the view is created, you can see all the columns from the data source.

 

 

Since the second solution is less invasive than the first one, it is applicable in all cases.

 

Wrap up:

This is a classical example when mistakes/bugs created in one application is transferred to another (PrestoSQL in this case).

First solution to the problem with missing columns is better, because it will force you to fix the problem at the source.

Alternative approach (solution #2) is just a workaround when, for some reasons (data source is already in production, you don’t have control or can’t get a downtime to implement changes etc.), you are not able to fix the problem at the source.



Get notified when a new post is published!

Loading

Comments

There are no comments yet. Why not start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.