Τρίτη 26 Φεβρουαρίου 2013

Migrating from oracle to postgres (part 2)

As I said some days ago, I have an oracle dump from a database that i want to migrate to postgres for a project. I setup a 100gb VM, I setup oracle 11gr2 and i compiled postgres 9.2.3 on the same VM. i restored the dump to oracle and now i had to make a migration strategy.
I had some ideas like getting the schema from the dump with expdp (sqlfile=) and manually translate it to postgres but i remembered a project that i saw some time ago called ora2pg. This cool thing is basically a perl program that needs except of perl (duh)  DBI and DBD for oracle and postgres. i set it up and i started experimenting first on the DDL's. the output was actually very good ! the only things that i had to change was some function based indexes and it was parse-able from postgres without problems, next was data. No problems there , it created a 20gb output file that all i had to do was to throw it to postgres. Just because i wasn't very proactive with my VM's disk space i faced some problems... 20Gb from the export plus the oracle data
(36664764 /opt/oradata/) + oracle software + other things ?
that was about 65% of my diskspace, and i had to also maintain a second live copy of these data+indexes on postgres.. So i gzipped the whole output file and i used "zcat |psql". worked like charm ,actually its still running as i write this because ...
And here are somethings to take under consideration before starting!
the dumpfile is 20gb, what happens if something goes wrong ?
You drop the database fix the ddl script and reapply the data
-- SUPER MEGA LAME --
Do it one object at the time, not like i did or at least separate the small from the big tables in groups and do the big tables one at the time, or something like :
ERROR: invalid input syntax for integer: "26.19"
CONTEXT: COPY twitter_tmp, line 1, column network_score: "26.19"

might end up fucking you up !
ora2pg translated a number column as integer , something that came back to bite me in the ass 2 hours after import run...
i changed the ddl script and i rerun it, if it fails again i have to start over.. SO, don't be like me , act smart, do it one part at the time :)
in all fairness i still have no idea what tables are in the database and my job has been made easy mode with ora2pg. So.. i cant blame it, i can only blame me for not MAKING A MIGRATION STRATEGY afterall.
i will also show you the parameters that i have changed in order to make the import faster, the difference is HUGE actually..

shared_buffers = 2GB
synchronous_commit = off
wal_buffers = 16MB
wal_writer_delay = 1000ms
checkpoint_segments = 64  




Anyway so far i think i'm good with the table definitions (including constraints) and the data (hopefully) but there is a lot more that have to be done, views , code etc...

thanks for reading
to be continued...
Vasilis


Πέμπτη 21 Φεβρουαρίου 2013

PostgreSQL Partitioning

Partitions are very usable when it comes to big tables, documentation suggests applying table partitioning when a table is bigger than 10Gb. In postgres there are 2 kinds of partitions 
  • Range
  • List
implementation will just need the following steps.
  1. Enable constraint exclusion in config file
  2. Create a master table
  3. Create child tables WITHOUT overlapping table constraints
  4. Create indexes , pk's
  5. Create function and trigger to insert data to child tables
 First thing to notice here is that partitioning is basically using table inheritance. but enough with how things are working in theory, lets create one.

First , check into postgresql.conf for this parameter:

constraint_exclusion = partition 

Now , lets create the master and child tables :

CREATE TABLE sales (
    sales_id serial NOT NULL,
    sales_date DATE NOT NULL DEFAULT CURRENT_DATE,
    description text
);

CREATE TABLE sales_2013_p1 (
CHECK ( sales_date >= DATE '2013-01-01' AND sales_date < DATE '2013-05-01' )
 ) INHERITS (sales);

 CREATE TABLE sales_2013_p2 (
CHECK ( sales_date >= DATE '2013-05-01' AND sales_date < DATE '2013-09-01' )
 ) INHERITS (sales);

 CREATE TABLE sales_2013_p3 (
CHECK ( sales_date >= DATE '2013-09-01' AND sales_date < DATE '2014-01-01' )
 ) INHERITS (sales);


notice the keyword INHERITS here :)
next, PK's and indexes on child tables,

ALTER TABLE sales_2013_p1 ADD CONSTRAINT sales_2013_p1_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p2 ADD CONSTRAINT sales_2013_p2_pkey PRIMARY KEY (sales_id, sales_date);
ALTER TABLE sales_2013_p3 ADD CONSTRAINT sales_2013_p3_pkey PRIMARY KEY (sales_id, sales_date);


CREATE INDEX idx_2013_p1 ON sales_2013_p1 (sales_date);
CREATE INDEX idx_2013_p2 ON sales_2013_p2 (sales_date);
CREATE INDEX idx_2013_p3 ON sales_2013_p3 (sales_date);


and finaly a function that returns trigger and the on-insert trigger itself.

CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
        INSERT INTO sales_2013_p1 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
        INSERT INTO sales_2013_p2 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
        INSERT INTO sales_2013_p3 VALUES (NEW.*);
    ELSE
        RAISE EXCEPTION 'Date out of range.!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;

CREATE TRIGGER insert_on_sales
    BEFORE INSERT ON sales
    FOR EACH ROW EXECUTE PROCEDURE sales_trig_func();


Now that we have a table with a basic partitioning schema, lets assume that we want to add more partitions for 2014. create a new child table (for examples sake) i will just create a partition for 2014.


CREATE TABLE sales_2014 (
CHECK ( sales_date >= DATE '2014-01-01' AND sales_date < DATE '2015-01-01' )
 ) INHERITS (sales);


ALTER TABLE sales_2014 ADD CONSTRAINT sales_2014_pkey PRIMARY KEY (sales_id, sales_date);

CREATE INDEX idx_2014 ON sales_2014 (sales_date);

CREATE OR REPLACE FUNCTION sales_trig_func()
RETURNS TRIGGER AS $$
BEGIN
    IF ( NEW.sales_date >= DATE '2013-01-01' AND NEW.sales_date < DATE '2013-05-01' ) THEN
        INSERT INTO sales_2013_p1 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-05-01' AND NEW.sales_date < DATE '2013-09-01' ) THEN
        INSERT INTO sales_2013_p2 VALUES (NEW.*);
    ELSIF ( NEW.sales_date >= DATE '2013-09-01' AND NEW.sales_date < DATE '2014-01-01' ) THEN
        INSERT INTO sales_2013_p3 VALUES (NEW.*);


    ELSIF ( NEW.sales_date >= DATE '2014-01-01' AND NEW.sales_date < DATE '2015-01-01' ) THEN
        INSERT INTO sales_2014 VALUES (NEW.*);

   ELSE
        RAISE EXCEPTION 'Date out of range.!';
    END IF;
    RETURN NULL;
END;
$$
LANGUAGE plpgsql;


and we are done!



Now lets say that 2013_p1 data are obsolete and we want to move them to a historical database, drop table, correct and replace the function and you are done.
 This is how the master table would look after these operations :

partition=# \d+ sales
                                                   Table "public.sales"
   Column    |  Type   |                        Modifiers                                               | Storage  | Stats target | Description
-------------+---------+----------------------------------------------------------+----------+--------------+-------------
 sales_id    | integer | not null default nextval('sales_sales_id_seq'::regclass) | plain    |              |
 sales_date  | date    | not null default ('now'::text)::date                                  | plain    |              |
 description | text    |                                                                                         | extended |              |
Triggers:
    insert_on_sales BEFORE INSERT ON sales FOR EACH ROW EXECUTE PROCEDURE sales_trig_func()
Child tables: sales_2013_p2,
                     sales_2013_p3,
                     sales_2014
Has OIDs: no



Good thing about this approach is that partitions are generally easy to maintain and administer, child tables can have different indexes with each other, and you can of course delete large portions of data that may not be needed any more just by dropping a partition, uh ! the performance is VERY good on insert and select and maintenance work like reindex is faster. Reindex in particular wouldn't lock the whole master table for writes. 



Thanks for reading
-Vasilis

Τρίτη 19 Φεβρουαρίου 2013

oracle 11gr2 to postgres 9.2.3 (part 1)

A friend of mine gave me an export (expdp) of an oracle 10g database, its almost 4gigz compressed so i reckon its about 12-14gb, the job is to migrate it to postgres... The plan is to create a VM running 11gr2 on oracle linux, compile postgres on the same VM and start migrating. Currently i have a couple of ideas on how to do it but the whole thing depends on what code functions, stored procedures the export contrains, from the export log i saw a quite complicated schema...

to be continued...

proper installation

Recently at work, i got assigned to upgrade some (about 10)
postgres 8.1.x to 9, I always liked compiling basically because I like the flexibility that compile offers, and thats what i proposed to the guys that are in charge of the project. They gave me a test system (vm) to play with, in all fairness they were a bit skeptical with the idea of compiling the rdbms. Mostly for reliability issues (don't ask me why). I explained that upgrading from source would be much easier later on and that the last year PostgreSQL developers are doing so much work that it cannot be ignored (pg_basebackup for example).Since the latest centos package was 9.1.x they agreed and i started working.
PostgreSQL is so easy to compile, no strange dependencies not many libraries needed. and about these reliability concerns ? Because developers are so neat, and i quote from
http://www.postgresql.org/support/versioning

"While upgrades always have some risk, PostgreSQL minor releases fix only frequently-encountered security and data corruption bugs to reduce the risk of upgrading. The community considers not upgrading to be riskier than upgrading."
 


You should never find an unexpected change that breaks an application in a minor PostgreSQL upgrade. Bug, security, and corruption fixes are always done in a way that minimizes the odds of introducing an externally visible behavior change, and if that's not possible, the reason why and the suggested workarounds will be detailed in the release notes. What you will find is that some subtle problems, resulting from resolved bugs, can clear up even after a minor version update. It's not uncommon to discover a report of a problem to one of the PostgreSQL mailing lists is resolved in the latest minor version update compatible with that installation, and upgrading to that version is all that's needed to make the issue go away.

so don't be afraid to upgrade , DO IT

If you want to compile postgres you will need (except from the basic development tools) the following 2 libraries :

zlib1g-dev (compression library needed if you want to compress directly from postgres, pg_dump -Z for example)

libreadline6-dev (used by the client to support history on psql)

You could call both these libraries optional and you actually can compile postgres without them, but meh , don't do that...

Other than that its pretty much straight forward,
untar
./configure
make
make install (check the documentation for extra flags , change of prefix etc)

postgres will install the binaries by default in /usr/local/pgsql/
Packages from the other hand will install in /usr/bin (MEGA SUPER LAME)

SO ! lets see the current situation. I inherited a postgresql server that had a package installation, the binaries located in /usr/bin and postgres home in /var/lib.
If it was compiled i could go to /usr/local/pgsql , rename it to /usr/local/pgsql8 remove it from the path and let it be, but yeah, i couldn't do that, i fixed the path, put /usr/local/pgsql before /usr/bin so the first postgres binaries in path order were 9's so i was ok. But still ! the packages tend to inter-grade deep into the OS making any other option (like running 2 postgres servers at the same time) more difficult that it should be.

I like adding some of the contrib packages into postgres , for logging, monitoring and tuning reasons pg_buffercache, pgbench, pg_stat_statements, pg_freespacemap are some examples.
In a compiled environment there is nothing easier than that, nothing extra to download not much to do, just compile the contrib module and add it to the database with CREATE EXTENSION.
Now lets say that after a while a new release of postgres comes and i wanna upgrade to that just because i want to stay current or because a new wow feature was added or a bug was resolved, all i have to do is compile the new version (assuming its a minor release) replace the binaries and restart the server. Package installed postgres server would have to wait till the distribution released a new package of the new server usually a few versions behind current, which means what ? a month after the release date? maybe more!?

Basically the only thing that i can imagine being easier with a package installation are the init scripts, and yeah , ok i don't think that is a huge pro comparing to a compiled version.

SO my suggestion is , compile the damn thing.. COMPILE !


Thanks for reading
-Vasilis















Δευτέρα 18 Φεβρουαρίου 2013

binary replication in practice

 A couple of days ago I started making a short howto about streaming replication in PostgreSQL 9.2. Most of these things are well documented but in this howto i will also try to experiment with switch overs and switchbacks. It aims to show how easy it is to set it up right out of the box.

Streaming replication PostgreSQL 9.2

For my example i will use 2 debian VM's 192.168.0.100 (pglab1) and 192.168.0.101 (pglab2)


- not mandatory -

exchange ssh keys for passwordless ssh ,might be used if we need to scp scripts , rsync or whatever.

if you don't know how to do it, follow these steps :
http://www.debian-administration.org/articles/152

- ON MASTER (PGLAB1)  -
after you create a postgres cluster using initdb,
edit master's postgresql.conf and change the following :
listen_addresses = '*'
wal_level = hot_standby #(could be archive too)
max_wal_senders = 5
hot_standby = on


create a replication user :
create user repuser replication password 'passwd';

edit pg_hba.conf and add :


host all all 192.168.0.0/0 trust
host replication repuser 192.168.0.0/0 md5


- ON SLAVE -

Now ssh to slave and from $PGDATA run :

pg_basebackup -D /opt/data/ -v -h 192.168.0.100 -U repuser
 enter password , this will transfer a full copy of your cluster from your master. check the documentation of pg_basebackup for compression and other options available.

In $PGDATA edit a file called recovery.conf containing :
standby_mode = on
 primary_conninfo = 'host=192.168.0.100 port=5432 user=repuser password=passwd'


with the master up, start slave , it should say :
LOG: database system is ready to accept read only connections


At this point you will have 2 nodes running with your master accepting read/write operations and your slave accepting only read only operations (reporting goes here maybe ?) now, lets say the master crashes and you need to failover and promote slave as the new master.

-FAILOVER-

shutdown master 
on slave, execute:
pg_ctl promote


that's it, your slave (pglab2) is now master accepting all kinds of connections.

now lets say that the ex-master (pglab1) , is fixed and is ready to come up again,

- Switch back to original Master -


on the current master (pglab2) :

echo "select pg_start_backup('clone',true);" |psql pgbench
rsync -av --exclude postgresql.pid /opt/data/* 192.168.0.100:/opt/data/
echo "select pg_stop_backup();"|psql pgbench

this will sync all data from my current master (pglab2) to my current -to be- slave (pglab1), should be currently down.

edit recovery.done and fix the ip of the current master
rename recovery.done to recovery.conf
start ex-master (pglab1), now as slave , promote it with the slave (pglab2) down,
recreate slave (pglab2) with rsync, edit recovery.conf and start it again, 
the servers now have their original roles.


note that you can have 2 master databases but this might (and probably will) create a mess, so be sure that you bring down the correct server before promoting.

if you want to check if a server is acting as master or slave , run :

select pg_is_in_recovery()
If it's true, you're on a slave.


Thanks for reading.

Here we are

Hello, my name is Vasilis. I am a database admin for 10 years now, and in this blog i will try to share my database "adventures", post howto's and review database products, news, releases etc.
This blog will be mainly about PostgreSQL, my (by far) favorite RDBMS
but i might aswell write about oracle and ms-sql server.


-Welcome