Feed aggregator

Using DbVisualizer to work with #Oracle, #PostgreSQL and #Exasol

The Oracle Instructor - Tue, 2019-07-02 09:01

As a Database Developer or Database Administrator, it becomes increasingly unlikely that you will work with only one platform.

It’s quite useful to have one single tool to handle multiple different database platforms. And that’s exactly the ambition of DbVisualizer.

As a hypothecial scenario, let’s assume you are a database admin who works on a project to migrate from Oracle to EDB Postgres and Exasol.

The goal might be to replace the corporate Oracle database landscape, moving the OLTP part to EDB Postgres and the DWH / Analytics part to Exasol.

Instead of having to switch constantly between say SQL Developer, psql and EXAplus, a more efficient approach would be using DbVisualizer for all three.

I created one connection for each of the three databases here for my demo:Now let’s see if statements I do in Oracle also work in EDB Postgres and in Exasol:

Oracle

EDB

Exasol

Works the same for all three! The convenient thing here is that I just had to select the Database Connection from the pull down menu while leaving the statement as it is. No need to copy & paste even.

What about schemas and tables?

Oracle

In EDB, I need to create a schema accordingly:

EDB

 

In Exasol, schema and table can be created in the same way:

Exasol

Notice that the data types got silently translated into the proper Exasol data types:

Exasol

There is no DBA_TABLES in Exasol, though:

Exasol

Of course, there’s much more to check and test upon migration, but I think you got an idea how a universal SQL Client like DbVisualizer might help for such purposes.

 

Categories: DBA Blogs

opt_estimate 4

Jonathan Lewis - Mon, 2019-07-01 07:18

In the previous article in this series on the opt_estimate() hint I mentioned the “query_block” option for the hint. If you can identify a specify query block that becomes an “outline_leaf” in an execution plan (perhaps because you’ve deliberately given an query block name to an inline subquery and applied the no_merge() hint to it) then you can use the opt_estimate() hint to tell the optimizer how many rows will be produced by that query block (each time it starts). The syntax of the hint is very simple:


opt_estimate(@{query block name}  query_block  rows={number of rows})

As with other options for the hint, you can use scale_rows=, min=, max= as alternatives (the last seems to be used in the code generated by Oracle for materialized view refreshes) but the simple “rows=N” is likely to be the most popular. In effect it does the same as the “non-specific” version of the cardinality() hint – which I’ve suggested from time to time as a way of telling the optimizer the size of a data set in a materialized CTE (“with” subquery), e.g.


set serveroutput off

with demo as (
        select  /*+
                        qb_name(mat_cte)
                        materialize
                        cardinality(@mat_cte 11)
--                      opt_estimate(@mat_cte query_block rows=11)
                */
                distinct trunc(created)    date_list
        from    all_objects
)
select  * from demo
;

select * from table(dbms_xplan.display_cursor);
    

Regardless of whether you use the opt_estimate() or cardinality() hint above, the materialized temporary table will be reported with 11 rows. (Note that in this case where the hint is inside the query block it applies to the “@mat_cte” isn’t necessary).

In the previous article I generated some data with a script called opt_est_gby.sql to show you the effects of the group_by and having options of the opt_estimate() hint and pointed out that there were case where you might also want to include the query_block option as well. Here’s a final example query showing the effect, with the scale_rows feature after creating a table t2 as a copy of t1 but setting pctfree 75 (to make a tablescan more expensive) and creating an index on t2(id):


create table t2 pctfree 75 as select * from t1;
create index t2_i1 on t2(id);

select
        t2.n1, t1ct
from
        t2,
        (
        select  /*+
                        qb_name(main)
                        opt_estimate(@main group_by scale_rows=4)
                        opt_estimate(@main having scale_rows=0.4)
                        opt_estimate(@main query_block scale_rows=0.5)
                */
                mod(n1,10), count(*) t1ct
        from    t1
        group by
                mod(n1,10)
        having
                count(*) > 100
        ) v1
where
        t2.id = v1.t1ct
;

--------------------------------------------------------------------------------------
| Id  | Operation                    | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |       |     8 |   168 |    27   (8)| 00:00:01 |
|   1 |  NESTED LOOPS                |       |     8 |   168 |    27   (8)| 00:00:01 |
|   2 |   NESTED LOOPS               |       |     8 |   168 |    27   (8)| 00:00:01 |
|   3 |    VIEW                      |       |     8 |   104 |    10  (10)| 00:00:01 |
|*  4 |     FILTER                   |       |       |       |            |          |
|   5 |      HASH GROUP BY           |       |     8 |    32 |    10  (10)| 00:00:01 |
|   6 |       TABLE ACCESS FULL      | T1    |  3000 | 12000 |     9   (0)| 00:00:01 |
|*  7 |    INDEX RANGE SCAN          | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
|   8 |   TABLE ACCESS BY INDEX ROWID| T2    |     1 |     8 |     2   (0)| 00:00:01 |
--------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   4 - filter(COUNT(*)>100)
   7 - access("T2"."ID"="V1"."T1CT")


I’ve inlined the last query (with the two opt_estimate() hints) that I used in the previous article, and added a third opt_estimate() hint to that inline view. In this case I didn’t have to add a no_merge() hint because the numbers worked in my favour but to be safe in a production environment that’s a hint that I should have included.

You may recall that the hash group by on its own resulted in a prediction of 200 rows, and with the having clause the prediction dropped to 10 rows (standard 5%). With my three opt_estimate() hints in place I should see the effects of the following arithmetic:


group by      200       * 4   = 800
having        5% of 800 * 0.4 =  16
query block   16        * 0.5 =   8

As you can see, the cardinality prediction for the VIEW operation is, indeed, 8 – so the combination of hints has worked. It’s just a shame that we can’t see the three individual steps in the arithmetic as we walk the plan.

A Warning

As always I can only repeat – hinting is not easy; and “not easy” usually translates to “not stable / not safe” (and thanks to a Freudian slip while typing: “not sage”. You probably don’t know how do it properly, except in the very simplest cases, and we don’t really know how Oracle is interpreting the hints (particularly the undocumented ones). Here’s an example of how puzzling even the opt_estimate(query_block) hint can be – as usual starting with some data:

rem
rem     Script:         opt_estimate_2.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select * from all_objects;

create table t2
as
select * from all_objects;

As you can see, I’ve been a bit lazy with this example (which I wrote a couple of years ago) and it uses all_objects as a convenient source of data. Unfortunately this means you won’t necessarily be able to reproduce exactly the results I’m about to show you, which I did on a small instance of 12.2.0.1. I’m going to examine four versions of a simple query which

  • restricts the rows from t1,
  • finds the unique set of object_types in that subset of t1
  • then joins to t2 by object_type

select
        /*+ 
                qb_name(main)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;


select
        /*+ 
                qb_name(main)
                merge(@inline)
                opt_estimate(@inline query_block rows=14)
        */
        t2.object_id, t2.object_name, created
from    (
        select  /*+ qb_name(inline) */
                distinct object_type
        from    t1 
        where 
                created >= date'2017-03-01' 
        )       v1,
        t2
where
        t2.object_type = v1.object_type
;

The first version is my unhinted baseline (where, in my case, Oracle doesn’t use complex view merging), the second forces complex view merging of the inline aggregate view, then queries 3 and 4 repeat queries 1 and 2 but tell the optimizer that the number of distinct object_type values  is 14 (roughly half the actual in may case). But there is an oddity in the last query – I’ve told the optimizer how many rows it should estimate for the inline view but I’ve also told it to get rid of the inline view and merge it into the outer query block; so what effect is that going to have? My hope would be that the hint would have to be ignored because it’s going to apply to a query block that doesn’t exist in the final plan and that makes it irrelevant and unusable. Here are the four execution plans:


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 61776 |  4464K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    27 |   351 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    27 |   486 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 | 61776 |  5308K|       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           | 61776 |  5489K|  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


-----------------------------------------------------------------------------
| Id  | Operation            | Name | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------
|   0 | SELECT STATEMENT     |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|*  1 |  HASH JOIN           |      | 32032 |  2314K|   338   (7)| 00:00:01 |
|   2 |   VIEW               |      |    14 |   182 |   173   (9)| 00:00:01 |
|   3 |    HASH UNIQUE       |      |    14 |   252 |   173   (9)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL| T1   | 59458 |  1045K|   164   (4)| 00:00:01 |
|   5 |   TABLE ACCESS FULL  | T2   | 61776 |  3680K|   163   (4)| 00:00:01 |
-----------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."OBJECT_TYPE"="V1"."OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))


--------------------------------------------------------------------------------------------
| Id  | Operation              | Name      | Rows  | Bytes |TempSpc| Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |           |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   1 |  VIEW                  | VM_NWVW_1 |    14 |  1232 |       |  1492   (2)| 00:00:01 |
|   2 |   HASH UNIQUE          |           |    14 |  1274 |  6112K|  1492   (2)| 00:00:01 |
|*  3 |    HASH JOIN RIGHT SEMI|           | 61776 |  5489K|       |   330   (5)| 00:00:01 |
|*  4 |     TABLE ACCESS FULL  | T1        | 59458 |  1045K|       |   164   (4)| 00:00:01 |
|   5 |     TABLE ACCESS FULL  | T2        | 61776 |  4403K|       |   163   (4)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T2"."OBJECT_TYPE"="OBJECT_TYPE")
   4 - filter("CREATED">=TO_DATE(' 2017-03-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))

The first plan tells us that most of the rows in t1 have created > 1st March 2017 and there are (estimated) 27 distinct values for object_type; and there are 61,776 rows in t2 (which is basically the same as t1), and none of them are eliminated by the join on object_type from the inline view.

The second plan (with the forced complext view merging) shows Oracle changing the view with “distinct” into a (right) semi-join between t2 and t1 with the internal view name of VM_NWVW_1 – and the cardinality is correct.

The third plan shows that my hint telling the optimizer to assume the original inline view produces 14 rows has been accepted and, not surprisingly, when we claim that we have roughly half the number of object_type values the final estimate of rows in the join is roughly halved.

So what happens in the fourth plan when our hint applies to a view that no longer exists? I think the optimizer should have discarded the hint as irrelevant the moment it merged the view. Unfortunately it seems to have carried the hint up into the merged view and used it to produce a wildly inaccurate estimate for the final cardinality. If this had been a three-table join this is the sort of error that could make a sensible hash join into a third table become an unbelievably stupid nested loop join. If you had thought you were doing something incredibly clever with (just) the one opt_estimate() hint, the day might come when a small change in the statistics resulted in the optimizer using a view merge strategy you’d never seen before and producing a catastrophic execution plan in (say) an overnight batch that then ran “forever”.

Hinting is hard, you really have to be extremely thorough in your hints and make sure you cover all the options that might appear. And then you might still run into something that looks (as this does) like a bug.

Footnote

Here’s a closing thought: even if you manage to tell the optimizer exactly how many rows will come out of a query block to be joined to the next table in the query, you may still get a very bad plan unless you can also tell the optimizer how many distinct values of the join column(s) there are in that data set. Which means you may also have to learn all about the (even more undocumented) column_stats() hint.

 

Modifying pg_hba.conf from inside PostgreSQL

Yann Neuhaus - Sat, 2019-06-29 07:14

During one of the sessions from the last Swiss PGDay there was a question which could not be answered during the talk: Is it possible to modify pg_hba.conf from inside PostgreSQL without having access to the operating system? What everybody agreed on is, that there currently is no build-in function for doing this.

When you are on a recent version of PostgreSQL there is a view you can use to display the rules in pg_hba.conf:

postgres=# select * from pg_hba_file_rules ;
 line_number | type  |   database    | user_name |  address  |                 netmask                 | auth_method | options | error 
-------------+-------+---------------+-----------+-----------+-----------------------------------------+-------------+---------+-------
          84 | local | {all}         | {all}     |           |                                         | trust       |         | 
          86 | host  | {all}         | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          88 | host  | {all}         | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          91 | local | {replication} | {all}     |           |                                         | trust       |         | 
          92 | host  | {replication} | {all}     | 127.0.0.1 | 255.255.255.255                         | trust       |         | 
          93 | host  | {replication} | {all}     | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
          94 | host  | {all}         | {mydb}    | ::1       | ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff | trust       |         | 
(7 rows)

But there is nothing which allows you to directly modify that. When you are lucky and you have enough permissions there is a way to do it, though. First, lets check where pg_hba.conf is located:

postgres=# select setting from pg_settings where name like '%hba%';
           setting           
-----------------------------
 /u02/pgdata/DEV/pg_hba.conf

Having that information we can load that file to a table:

postgres=# create table hba ( lines text ); 
CREATE TABLE
postgres=# copy hba from '/u02/pgdata/DEV/pg_hba.conf';
COPY 93

Once it is loaded we have the whole content in our table (skipping the comments and empty lines here):

postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
(6 rows)

As this is a normal table we can of course add a row:

postgres=# insert into hba (lines) values ('host  all mydb  ::1/128                 trust');
INSERT 0 1
postgres=# select * from hba where lines !~ '^#' and lines !~ '^$';
                                 lines                                 
-----------------------------------------------------------------------
 local   all             all                                     trust
 host    all             all             127.0.0.1/32            trust
 host    all             all             ::1/128                 trust
 local   replication     all                                     trust
 host    replication     all             127.0.0.1/32            trust
 host    replication     all             ::1/128                 trust
 host  all mydb  ::1/128                 trust
(7 rows)

And now we can write it back:

postgres=# copy hba to '/u02/pgdata/DEV/pg_hba.conf';
COPY 94

Reading the whole file confirms that our new rule is there:

postgres=# select pg_read_file('pg_hba.conf');
                               pg_read_file                               
--------------------------------------------------------------------------
 # PostgreSQL Client Authentication Configuration File                   +
 # ===================================================                   +
 #                                                                       +
 # Refer to the "Client Authentication" section in the PostgreSQL        +
 # documentation for a complete description of this file.  A short       +
 # synopsis follows.                                                     +
 #                                                                       +
 # This file controls: which hosts are allowed to connect, how clients   +
 # are authenticated, which PostgreSQL user names they can use, which    +
 # databases they can access.  Records take one of these forms:          +
 #                                                                       +
 # local      DATABASE  USER  METHOD  [OPTIONS]                          +
 # host       DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostssl    DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 # hostnossl  DATABASE  USER  ADDRESS  METHOD  [OPTIONS]                 +
 #                                                                       +
 # (The uppercase items must be replaced by actual values.)              +
 #                                                                       +
 # The first field is the connection type: "local" is a Unix-domain      +
 # socket, "host" is either a plain or SSL-encrypted TCP/IP socket,      +
 # "hostssl" is an SSL-encrypted TCP/IP socket, and "hostnossl" is a     +
 # plain TCP/IP socket.                                                  +
 #                                                                       +
 # DATABASE can be "all", "sameuser", "samerole", "replication", a       +
 # database name, or a comma-separated list thereof. The "all"           +
 # keyword does not match "replication". Access to replication           +
 # must be enabled in a separate record (see example below).             +
 #                                                                       +
 # USER can be "all", a user name, a group name prefixed with "+", or a  +
 # comma-separated list thereof.  In both the DATABASE and USER fields   +
 # you can also write a file name prefixed with "@" to include names     +
 # from a separate file.                                                 +
 #                                                                       +
 # ADDRESS specifies the set of hosts the record matches.  It can be a   +
 # host name, or it is made up of an IP address and a CIDR mask that is  +
 # an integer (between 0 and 32 (IPv4) or 128 (IPv6) inclusive) that     +
 # specifies the number of significant bits in the mask.  A host name    +
 # that starts with a dot (.) matches a suffix of the actual host name.  +
 # Alternatively, you can write an IP address and netmask in separate    +
 # columns to specify the set of hosts.  Instead of a CIDR-address, you  +
 # can write "samehost" to match any of the server's own IP addresses,   +
 # or "samenet" to match any address in any subnet that the server is    +
 # directly connected to.                                                +
 #                                                                       +
 # METHOD can be "trust", "reject", "md5", "password", "scram-sha-256",  +
 # "gss", "sspi", "ident", "peer", "pam", "ldap", "radius" or "cert".    +
 # Note that "password" sends passwords in clear text; "md5" or          +
 # "scram-sha-256" are preferred since they send encrypted passwords.    +
 #                                                                       +
 # OPTIONS are a set of options for the authentication in the format     +
 # NAME=VALUE.  The available options depend on the different            +
 # authentication methods -- refer to the "Client Authentication"        +
 # section in the documentation for a list of which options are          +
 # available for which authentication methods.                           +
 #                                                                       +
 # Database and user names containing spaces, commas, quotes and other   +
 # special characters must be quoted.  Quoting one of the keywords       +
 # "all", "sameuser", "samerole" or "replication" makes the name lose    +
 # its special character, and just match a database or username with     +
 # that name.                                                            +
 #                                                                       +
 # This file is read on server startup and when the server receives a    +
 # SIGHUP signal.  If you edit the file on a running system, you have to +
 # SIGHUP the server for the changes to take effect, run "pg_ctl reload",+
 # or execute "SELECT pg_reload_conf()".                                 +
 #                                                                       +
 # Put your actual configuration here                                    +
 # ----------------------------------                                    +
 #                                                                       +
 # If you want to allow non-local connections, you need to add more      +
 # "host" records.  In that case you will also need to make PostgreSQL   +
 # listen on a non-local interface via the listen_addresses              +
 # configuration parameter, or via the -i or -h command line switches.   +
                                                                         +
 # CAUTION: Configuring the system for local "trust" authentication      +
 # allows any local user to connect as any PostgreSQL user, including    +
 # the database superuser.  If you do not trust all your local users,    +
 # use another authentication method.                                    +
                                                                         +
                                                                         +
 # TYPE  DATABASE        USER            ADDRESS                 METHOD  +
                                                                         +
 # "local" is for Unix domain socket connections only                    +
 local   all             all                                     trust   +
 # IPv4 local connections:                                               +
 host    all             all             127.0.0.1/32            trust   +
 # IPv6 local connections:                                               +
 host    all             all             ::1/128                 trust   +
 # Allow replication connections from localhost, by a user with the      +
 # replication privilege.                                                +
 local   replication     all                                     trust   +
 host    replication     all             127.0.0.1/32            trust   +
 host    replication     all             ::1/128                 trust   +
 host  all mydb  ::1/128                 trust                           +
(1 row)

All you need to do from now on is to reload the configuration and you’re done:

postgres=# select pg_reload_conf();
 pg_reload_conf 
----------------
 t
(1 row)

Of course: Use with caution!

Cet article Modifying pg_hba.conf from inside PostgreSQL est apparu en premier sur Blog dbi services.

opt_estimate 3

Jonathan Lewis - Fri, 2019-06-28 07:12

This is just a quick note to throw out a couple of of the lesser-known options for the opt_estimate() hint – and they may be variants that are likely to be most useful since they address a problem where the optimizer can produce consistently bad cardinality estimates. The first is the “group by” option – a hint that I once would have called a “strategic” hint but which more properly ought to be called a “query block” hint. Here’s the simplest possible example (tested under 12.2, 18.3 and 19.2):


rem
rem     Script:         opt_est_gby.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

create table t1
as
select
        rownum                  id,
        mod(rownum,200)         n1,
        lpad(rownum,10,'0')     v1,
        rpad('x',100)           padding
)
from
        dual
connect by
        level <= 3000
;

set autotrace on explain

prompt  =============================
prompt  Baseline cardinality estimate
prompt  (correct cardinality is 10)
prompt  Estimate will be 200
prompt  =============================

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*) 
from    t2 
group by 
        mod(n1,10)
;

I’ve generated a table of 3,000 rows with a column n1 holding 15 rows each of 200 distinct values. The query then aggregates on mod(n1,10) so it has to return 10 rows, but the optimizer doesn’t have a mechanism for inferring this and produces the following plan – the Rows value from the HASH GROUP BY at operation 1 is the only thing we’re really interested in here:


---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |   200 |   800 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |   200 |   800 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

It looks as if the optimizer’s default position is to use num_distinct from the underlying column as the estimate for the aggregate. We can work around this in the usual two ways with an opt_estimate() hint. First, let’s tell the optimizer that it’s going to over-estimate the cardinality by a factor of 10:


select  /*+
                qb_name(main)
                opt_estimate(@main group_by, scale_rows = 0.1)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    20 |    80 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    20 |    80 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

The hint uses group_by as the critical option parameter, and then I’ve used the standard scale_rows=nnn to set a scaling factor that should be used to adjust the result of the default calculation. At 10% (0.1) this gives us an estimate of 20 rows.

Alternatively, we could simply tell the optimizer how many rows we want it to believe will be generated for the aggregate – let’s just tell it that the result will be 10 rows.

select  /*+
                qb_name(main)
                opt_estimate(@main group_by, rows = 10)
        */
        mod(n1,10), count(*) 
from    t1 
group by 
        mod(n1,10)
;

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |    40 |    10  (10)| 00:00:01 |
|   1 |  HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |
|   2 |   TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
---------------------------------------------------------------------------

We use the same group_by as the critical parameter, with rows=nnn.

Next steps

After an aggregation there’s often a “having” clause so you might consider using the group_by option to fix up the cardinality of the having clause if you know what the normal effect of the having clause should be. For example: “having count(*) > NNN” will use the optimizer’s standard 5% “guess” and “having count(*) = NNN” will use the standard 1% guess. However, having seen the group_by options I took a guess that there might be a having option to the opt_estimate() hint as well, so I tried it – with autotrace enabled here are three queries, first the unhinted baseline (which uses the standard 5% on my having clause) then a couple of others with hints to tweak the cardinality:

select  /*+
                qb_name(main)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main having scale_rows=0.4)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

select  /*+
                qb_name(main)
                opt_estimate(@main group_by scale_rows=2)
                opt_estimate(@main having scale_rows=0.3)
        */
        mod(n1,10), count(*)
from    t1
group by
        mod(n1,10)
having
        count(*) > 100
;

The first query gives us the baseline cardinality of 10 (5% of 200). The second query scales the having cardinality down by a factor of 0.4  (with means an estimate of 4). The final query first doubles the group by cardinality (to 400), then scales the having cardinality (which would have become 20) down by a factor of 0.3 with the nett effect of producing a cardinality of 6. Here are the plans.

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |    10 |    40 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |   --  10
|   2 |   HASH GROUP BY     |      |    10 |    40 |    10  (10)| 00:00:01 |   -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     4 |    16 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   4
|   2 |   HASH GROUP BY     |      |     4 |    16 |    10  (10)| 00:00:01 |    -- 200
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

----------------------------------------------------------------------------
| Id  | Operation           | Name | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------
|   0 | SELECT STATEMENT    |      |     6 |    24 |    10  (10)| 00:00:01 |
|*  1 |  FILTER             |      |       |       |            |          |    --   6
|   2 |   HASH GROUP BY     |      |     6 |    24 |    10  (10)| 00:00:01 |    -- 400
|   3 |    TABLE ACCESS FULL| T1   |  3000 | 12000 |     9   (0)| 00:00:01 |
----------------------------------------------------------------------------

It’s a little sad that the FILTER operation shows no estimate while the HASH GROUP BY operation shows the estimate after the application of the having clause. It would be nice to see the plan reporting the figures which I’ve added at the end of line for operations 1 and 2.

You may wonder why one would want to increase the estimate for the group by then reduce it for the having. While I’m not going to go to the trouble of creating a worked example it shouldn’t be too hard to appreciate the idea that the optimizer might use complex view merging to postpone a group by until after a join – so increasing the estimate for a group by might be necessary to ensure that that particular transformation doesn’t happen, while following this up with a reduction to the having might then ensure that the next join is a nested loop rather than a hash join. Of course, if you don’t need to be this subtle you might simply take advantage of yet another option to the opt_estimate() hint, the query_block option – but that will (probably) appear in the next article in this series.

 

Should the Oracle APEX Community Care About Autonomous Database?

Joel Kallman - Fri, 2019-06-28 06:14


This past week, Oracle announced the availability of Oracle APEX, SQL Developer Web and Oracle REST Data Services on Oracle Autonomous Database.  If you're in the APEX community, should you care?  I say "absolutely yes!", but not for the reasons you might suspect.

Autonomous Database is strategic to Oracle.  Just read the transcript from the recent Oracle quarterly earnings conference call and it will be obvious to you.  Autonomous is an advancement in technology that has significant investment from Oracle and very real benefits for customers.  It's a clear market differentiator - I do truly believe this, it's not merely my marketing spin.  And now, with the addition of Oracle APEX & SQL Developer Web & Oracle REST Data Services, I think this combination of technologies provides even more capabilities to this platform and even greater differentiation.  What other service provides elastic, autonomous capabilities, application design and proven low code application development, out-of-the-box?  Did I mention that this also happens to include the world's most popular database, Oracle Database?

The benefits of low code application development are real.  And Low Code + Autonomous Database is the ideal combination.  Low code is about reducing costs, delivering faster, with greater consistency, and being usable by a broader range of skill sets.  Some of the benefits of Autonomous Database are equivalent - less cost, instant availability, usable by others who may not be world-class experts.  It has been a long multi-year confluence of events that has brought us together here.

The APEX community is the envy of others at Oracle.  Even people who aren't APEX fans recognize the APEX community's passion.  But where did this come from?  Do people really get excited about a tool?  No.  They get excited about what they can do with a tool - how it helps them deliver a solution, and be successful.  A carpenter doesn't get passionate about his dual-slide compound miter saw because it's a cool tool.  He gets satisfaction about what he can actually do with that tool versus a hand saw.  When you get a pay raise or praise or a promotion because of what you've been able to deliver with APEX and ORDS and Oracle Database, that's a reason to get excited!  And I think that is ultimately the real story behind the enviable, tangible energy in the APEX community.  Countless people have had many great successes with this combination of technologies, and success begets success.

Let's say you're in the APEX community, you saw this announcement about APEX on Autonomous, but you're not interested in cloud.  Or, as Andre de Souza so eloquently stated on Twitter, "I know it’s big news, just does not affect 99,9% of current #orclapex developers I’m guessing."  Should you care?  I say yes, and here's why.  The great APEX community that I mention above, which has been so successful with APEX & ORDS & Oracle Database over the years, has become very large across the globe, and with not a lot of help from Oracle.  Make no mistake - Oracle does invest in APEX, millions of dollars every year.  But I still come across Oracle Database customers who have simply never heard of APEX.  This is because there has not been much promotion from Oracle marketing or public relations or even sales.  All of this is about to change.  Why?  Because APEX is on Autonomous Database, and Autonomous Database is strategic to Oracle.  You will probably see more communication and discussion from Oracle about APEX than probably the last 20 years combined.  Low code resonates with customers, APEX is proven, and everyone has application development needs.

How does this benefit someone in the APEX community?  Simple:

  1. Awareness and interest will rise by people who have never heard about APEX before, both existing on-premises customers and net new customers.
  2. There will be greater demand for APEX and database development talent.  If you have experience with APEX, with a proven track record of delivering solutions with APEX, you're a very attractive person.  Perhaps the rate you charge has now gotten a bit higher.  You'll certainly gain upward mobility.
  3. You'll no longer have to introduce someone to APEX for the very first time, or counter the claim that "it's not strategic."
  4. As our friends from Explorer UK say, with APEX, they "develop cloud ready applications".  And you've been doing this for years.  Don't be afraid to make this claim.  When and if you're ready for cloud, you're already out of the gate.  The same APEX apps you developed on-premises run and look and feel exactly the same in the cloud.  Who has been developing cloud-ready apps for years?  You!

So.  Even if you're not into "cloud" but into APEX, this announcement and these capabilities on Autonomous Database has material impact on you and everyone else in the APEX community.  Your skills and experience will become more valued, and we should expect the market and interest and demand to grow.

Everything is not perfect, and we on the APEX team still have a lot of very hard work ahead of us.  But these are exciting times and it's what we've labored on for the past 20 years, to get to this point.  For those who have been with the APEX community for so many years, congratulations!  You've bet on the right horse.  Just fasten your seat belt.

Windocks and K8s support

Yann Neuhaus - Fri, 2019-06-28 00:33

I got recently the 4.08 update from the Windocks team and I was very excited to evaluate some of new features. The first cool one I want to present in this blog concerns the Kubernetes support for deploying Windocks containers that will make my application deployment definitely easier. Let’s say you want to deploy your application that is tied to a Windocks container for SQL Server. In a previous blog post I explained why we are using Windocks in our context. So, with previous versions of Windocks, we had to write custom scripts to deploy applications on K8s that are tied to a Windocks. With the new version 4.08, this process may be simplified because both of applications and their related Windocks containers are directly deployable on K8s by using a YAML deployment file.

In fact, the new way consists in deploying a Windocks SQL Server proxy on K8s that works in conjunction with a Windocks Server. Once the SQL Server proxy deployed a corresponding Windocks container is spinning up with their specific parameters as shown in the picture below:

 

First of all, in order to make access secure between K8s and the Windocks Server authentication is required and we need to provide credential information that will be stored in the sql-proxy secret in K8s. SA password is also included in this secret and will be used to setup the SA account when the Windocks container will spin up.

$ kubectl create secret generic proxy-secrets --from-literal=WINDOCKS_REQUIRED_USERNAME='clustadmin' --from-literal=WINDOCKS_REQUIRED_PASSWORD='StrongPassword' --from-literal=WINDOCKS_REQUIRED_CONTAINER_SAPASSWORD=’sa_password'

 

The next step consists in deploying the Windocks SQL proxy by with the specific environment variables including WINDOCKS_REQUIRED_HOSTNAME (Windocks server name or IP Address), WINDOCKS_REQUIRED_IMAGE_NAME (Windocks based image used for container) and WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT (optional).

  • The Windocks SQL Proxy YAML file
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: windocks-sql-proxy-secure 
  labels:
    app: sqlproxy-secure 
spec:
  replicas: 1 
  template:
    metadata:
      labels:
        app: sqlproxy-secure 
        tier: frontend
    spec:
      containers:
      - name: sqlproxy-secure-app 
        image: windocks/windocks-sql-server-proxy 
        imagePullPolicy: Always
        ports:
        - name: tcp-proxy
          containerPort: 3087
        - name: tls-proxy
          containerPort: 3088
        envFrom:
          - secretRef:
              name: proxy-secrets
        env:
          - name: PROJECT_ID
            value: project_id_for_GKE_deployment_optional
          - name: WINDOCKS_REQUIRED_HOSTNAME
            value: xx.xxx.xxx.xxx
          - name: WINDOCKS_REQUIRED_IMAGE_NAME
            value: 2012_ci
          - name: WINDOCKS_SQL_PROXY_OPTIONAL_LISTENING_PORT
            value: "3087"

 

If we want to make the SQL Proxy pod accessible from outside a service is needed but this is not mandatory according to the context. Note that you may also use TLS connection to secure the network between K8s and the Windocks server.

  • The Windocks service YAML file
apiVersion: v1
kind: Service
metadata:
  name: windocks-sql-proxy-secure
  labels:
    app: sqlproxy-secure
    tier: frontend
spec:
  sessionAffinity: ClientIP
  type: LoadBalancer
  ports:
  - port: 3087
    name: tcp-proxy-secure-service
    targetPort: 3087
  - port: 3088
    name: tls-proxy-secure-service
    targetPort: 3088
  selector:
    app: sqlproxy-secure
    tier: frontend

 

Let’s give a try on my Azure infrastructure including an AKS cluster and a Windocks Server installed in an Azure VM. I also took the opportunity to create my own helm chart from the YAML files provided by the Windocks team. It will make my deployment easier for sure. Here the command I used to deploy my Windocks helm chart on my AKS cluster.

$ helm install --name windocks2012 --namespace dmk --set Windocks.Image=2012_ci --set Windocks.Port=3089 --set Windocks.PortSSL=3090 .

 

Deployment will be performed in a specific namespace named dmk and the 2012_ci image will be used as based image for my Windocks container. I will be able to connect to my Windocks container by using the 3089 port through the SQL Proxy deployed on K8s. After few seconds the following resources were deployed within my dmk namespace including a Windocks SQL Proxy pod and the Windocks SQL Proxy service.

$ kubectl get all -n dmk
NAME                                                                  READY   STATUS    RESTARTS   AGE
pod/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fb8694m   1/1     Running   0          13m

NAME                                                            TYPE           CLUSTER-IP     EXTERNAL-IP     PORT(S)
                 AGE
service/backend                                                 ClusterIP      10.0.126.154   <none>          80/TCP
                 8d
service/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   LoadBalancer   10.0.252.235   xx.xx.xxx.xxx   3089:30382/TCP,3090:30677/TCP   44m

NAME                                                                    DESIRED   CURRENT   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure   1         1         1            1           44m

NAME                                                                               DESIRED   CURRENT   READY   AGE
replicaset.apps/windocks2012-sqlproxy-securewindocks-sql-proxy-secure-56fbdb5c96   1         1         1       44m

 

Once deployed, the SQL proxy will redirect all connections from 3089 port to the container port after spinning up the corresponding Windocks container on the Windocks server. We may get some details by taking a look at the SQL Proxy logs on K8s. As a reminder the container port is allocated dynamically by default by the Windocks server and the SQL proxy get it automatically for connection redirection.

…
Valid response for creating Windocks container
Container id is b1201aaaba3b4cd047953b624e541e26500024e42e6381936fc7b526b5596a99
Container port is 10001
Setting up tcp server
redirecting connections from 127.0.0.1:3089 to xx.xxx.xxx.xxx:10001 
…

 

Let’s try to connect by using mssql-cli and the external IP of the SQL Proxy service and the 3089 port. The connection redirect is effective and I can interact with my Windocks container on local port 10001:

master> SELECT top 1 c.local_net_address, c.local_tcp_port
....... FROM sys.dm_exec_connections as c; 
+---------------------+------------------+
| local_net_address   | local_tcp_port   |
|---------------------+------------------|
| 172.18.0.5          | 10001            |
+---------------------+------------------+

 

The Windocks container for SQL Server was spinning up my 3 testing databases as expected:

master> \ld+
+-------------------+-------------------------+-----------------------+------------------------------+
| name              | create_date             | compatibility_level   | collation_name               |
|-------------------+-------------------------+-----------------------+------------------------------|
| master            | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| tempdb            | 2019-06-27 20:04:04.273 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| model             | 2003-04-08 09:13:36.390 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| msdb              | 2012-02-10 21:02:17.770 | 110                   | SQL_Latin1_General_CP1_CI_AS |
| AdventureWorksDbi | 2019-06-27 20:04:03.537 | 100                   | Latin1_General_100_CS_AS     |
| ApplixEnterprise  | 2019-06-27 20:04:04.477 | 90                    | SQL_Latin1_General_CP1_CI_AS |
| dbi_tools         | 2019-06-27 20:04:05.153 | 100                   | French_CS_AS                 |
+-------------------+-------------------------+-----------------------+------------------------------+

 

From the Windocks server, I may get a picture of provisioned containers. The interesting one in our case is referenced by the name k8s-windocks2012/xxxx:

PS F:\WINDOCKS\SQL2012> docker ps
CONTAINER ID        IMAGE               COMMAND             CREATED             STATUS              PORTS               NAMES
e9dbe5556b2f        2012_ci             ""                  29 minutes ago      Stopped             10002/              dab/Windocks-id:31432367-c744-4ae3-8248-cb3fb3d2792e
b1201aaaba3b        2012_ci             ""                  13 minutes ago      Started             10001/              k8s-windocks2012/Windocks-id:cfa58c38-d168-4c04-b4c8-12b0552b93ad

 

Well, in a nutshell a feature we will consider to integrate in our DevOps Azure pipeline for sure. Stay tuned, other blog posts will come later.

See you!

 

 

 

Cet article Windocks and K8s support est apparu en premier sur Blog dbi services.

work Agile in a GxP-regulated environment

Yann Neuhaus - Thu, 2019-06-27 09:24

On 4 June 2019 I followed an invitation to

wega-it’s Know-how & Networking Breakfast 2 2019 on “Agile Validation in GxP Projects”.

So they were to discuss Agility in the context of GxP regulation.

I had some earlier exposure to various kinds of compliance roles and topics, and my current work environment is in the highly regulated Pharma industry. So I was really wondering (and hoping for learning) how you can possibly bring the two points of view of Agility and GxP regulation together. The Literaturhaus Basel was to see some very special type of literature that day. Not a formal presentation but a role play performance between these two viewpoints, represented by Evelyne Daniel, an experienced GxP validation expert, and Mathias Fuchs, an equally experienced Scrum master, both from wega-IT. A very nice idea, very appropriate for the topic!

What is Compliance (GxP) and what is Agile?

Typically in GxP compliance we work along the so-called V-model. In its rigidness and wanted plannability of course it corresponds largely to the waterfall model of traditional software development. Opposed to this the Agile Manifesto (2001) criticizes the very basics of these traditional ways of working. Remember the iconic claims like “Working software over comprehensive Documentation” and “Responding to change over following a Plan”. But how would you ever get acceptance in the compliance world without full documentation and planning!?! When I quickly browsed the internet, I found a multitude of statements and proposals which would certainly merit a separate blog post. For this time, I will give a quick summary of the wega breakfast presentation and some touch points with our current working environment in the Pharma industry.

Although in my current work environment we are not actually driving GxP Validation projects, we are still subject to the very tight GxP regulation. In the processes of Change and Release Management, this is reflected in the rigid rules of testing and documentation, to just name the most obvious examples. Background, of course, is the definition of Compliance and its Validation: the goal is to “establish documented evidence” to assure compliance and quality etc. These requirements hold independently of the quality, completeness or even up-to-date status of the pre-defined processes and rules! Call this inflexible and cumbersome! Any adaptation (update!) of the processes and rules is very formal through the complicated administrative processes to be used and hence very slow. Consider this in our fast-moving (not only IT-) world!

What is an MVP?

A nice play of words was interjected in the stage discussion: the acronym MVP has a very clear meaning as a basic concept for both sides, just it is not the same: MVP = Master Validation Plan (in GxP Validation) versus Minimal Viable Product (in Agile or Lean Software Development).

How to bring them together?

Now how to bring the core aspirations of Agile Development like Customer focus, Flexibility, Speed into the Compliance world? A first inevitable step in the V-model world: break up the (dead?) lock between a complete finalization of User Requirements Specification and the setup of a complete Validation Plan prescribing all Qualification criteria (IQ, OQ, PQ). Definition of Done (DoD) plays a major role when trying to cut the end-to-end Development-Validation elephant into smaller pieces. Inclusion of Validation into the “daily” Development activities is another must, instead of adding Validation at the end of Development phases only. Yet another core principle from the Agile side is the ensurance of team Maturity and Mindset. Much-hailed Diversity is opposed to pure compliance-oriented expert teams, striving for innovation and creativity in the team.

WEGA breakfast - Agile Validation in GxP projects

Some basic approaches

The final answer on how to – methodically – combine or maybe rather “emulsify” Agility and Compliance Validation comes as no surprise: there is no one-size-fits-all method. Rather three obvious basic approaches were presented.

  1. introducing Agility right between the left (Specifications) and the right (Qualifications) arms of the V-model, probably using some kind of piloting or prototyping
  2. including Validation into the Agile Development, almost doing Validation in each Agile sprint
  3. appending V-model Validation at the end of an Agile development.

The above-mentioned end-to-end Development-to-Validation elephant has to be broken into smaller better manageable units. Each specific project situation will have its own possible and best way to do it.

Think innovative and creative!

Thanks to wega-informatik (www.wega-it.com)  for organizing this creative and informative event.

 

Cet article work Agile in a GxP-regulated environment est apparu en premier sur Blog dbi services.

Glitches

Jonathan Lewis - Wed, 2019-06-26 11:11

Here’s a question just in from Oracle-L that demonstrates the pain of assuming things work consistently when sometimes Oracle development hasn’t quite finished a bug fix or enhancement. Here’s the problem – which starts from the “scott.emp” table (which I’m not going to create in the code below):

rem
rem     Script:         fbi_fetch_first_bug.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem 

-- create and populate EMP table from SCOTT demo schema

create index e_sort1 on emp (job, hiredate);
create index e_low_sort1 on emp (lower(job), hiredate);

set serveroutput off
alter session set statistics_level = all;
set linesize 156
set pagesize 60

select * from emp where job='CLERK'         order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

select * from emp where lower(job)='clerk' order by hiredate fetch first 2 rows only; 
select * from table(dbms_xplan.display_cursor(null,null,'cost allstats last outline alias'));

Both queries use the 12c “fetch first” feature to select two rows from the table. We have an index on (job, hiredate) and a similar index on (lower(job), hiredate), and given the similarity of the queries and the respective indexes (get the first two rows by hiredate where job/lower(job) is ‘CLERK’/’clerk’) we might expect to see the same execution plan in both cases with the only change being the choice of index used. But here are the plans:


select * from emp where job='CLERK'         order by hiredate fetch
first 2 rows only

Plan hash value: 92281638

----------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
----------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |     2 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |         |      1 |      2 |     2   (0)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |         |      1 |      3 |     2   (0)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP     |      1 |      3 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_SORT1 |      1 |      3 |     1   (0)|      3 |00:00:00.01 |       2 |
----------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("JOB"='CLERK')


select * from emp where lower(job)='clerk' order by hiredate fetch
first 2 rows only

Plan hash value: 4254915479

-------------------------------------------------------------------------------------------------------------------------------------------------------
| Id  | Operation                             | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |  OMem |  1Mem | Used-Mem |
-------------------------------------------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                      |             |      1 |        |     1 (100)|      2 |00:00:00.01 |       2 |       |       |          |
|*  1 |  VIEW                                 |             |      1 |      2 |     1   (0)|      2 |00:00:00.01 |       2 |       |       |          |
|*  2 |   WINDOW SORT PUSHED RANK             |             |      1 |      1 |     1   (0)|      2 |00:00:00.01 |       2 |  2048 |  2048 | 2048  (0)|
|   3 |    TABLE ACCESS BY INDEX ROWID BATCHED| EMP         |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       2 |       |       |          |
|*  4 |     INDEX RANGE SCAN                  | E_LOW_SORT1 |      1 |      1 |     1   (0)|      4 |00:00:00.01 |       1 |       |       |          |
-------------------------------------------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')


As you can see, with the “normal” index Oracle is able to walk the index “knowing” that the data is appearing in order, and stopping as soon as possible (almost) – reporting the WINDOW operation as “WINDOW NOSORT STOPKEY”. On the other hand with the function-based index Oracle retrieves all the data by index, sorts it, then applies the ranking requirement – reporting the WINDOW operation as “WINDOW SORT PUSHED RANK”.

Clearly it’s not going to make a lot of difference to performance in this tiny case, but there is a threat that the whole data set for ‘clerk’ will be accessed – and that’s the first performance threat, with the additional threat that the optimizer might decide that a full tablescan would be more efficient than the index range scan.

Can we fix it ?

Yes, Bob, we can. The problem harks back to a limitation that probably got fixed some time between 10g and 11g – here are two, simpler, queries against the emp table and the two new indexes, each with the resulting execution plan when run under Oracle 10.2.0.5:


select ename from emp where       job  = 'CLERK' order by hiredate;
select ename from emp where lower(job) = 'clerk' order by hiredate;

---------------------------------------------------------------------------------------
| Id  | Operation                   | Name    | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT            |         |     3 |    66 |     2   (0)| 00:00:01 |
|   1 |  TABLE ACCESS BY INDEX ROWID| EMP     |     3 |    66 |     2   (0)| 00:00:01 |
|*  2 |   INDEX RANGE SCAN          | E_SORT1 |     3 |       |     1   (0)| 00:00:01 |
---------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   2 - access("JOB"='CLERK')


--------------------------------------------------------------------------------------------
| Id  | Operation                    | Name        | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT             |             |     3 |    66 |     3  (34)| 00:00:01 |
|   1 |  SORT ORDER BY               |             |     3 |    66 |     3  (34)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID| EMP         |     3 |    66 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN          | E_LOW_SORT1 |     3 |       |     1   (0)| 00:00:01 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access(LOWER("JOB")='clerk')

The redundant SORT ORDER BY is present in 10g even for a simple index range scan. By 11.2.0.4 the optimizer was able to get rid of the redundant step, but clearly there’s a little gap in the code relating to the over() clause that hasn’t acquired the correction – even in 18.3.0.0 (or 19.2 according to a test on https://livesql.oracle.com).

To fix the 10g problem you just had to include the first column of the index in the order by clause: the result doesn’t change, of course, because you’re simply prefixing the required columns with a column which holds the single value you were probing the index for but suddenly the optimizer realises that it can do a NOSORT operation – so the “obvious” guess was to do the same for this “first fetch” example:

select * from emp where lower(job)='clerk' order by lower(job), hiredate fetch first 2 rows only;

--------------------------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name        | Starts | E-Rows | Cost (%CPU)| A-Rows |   A-Time   | Buffers |
--------------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |             |      1 |        |     3 (100)|      2 |00:00:00.01 |       4 |
|*  1 |  VIEW                         |             |      1 |      2 |     3  (34)|      2 |00:00:00.01 |       4 |
|*  2 |   WINDOW NOSORT STOPKEY       |             |      1 |      1 |     3  (34)|      2 |00:00:00.01 |       4 |
|   3 |    TABLE ACCESS BY INDEX ROWID| EMP         |      1 |      1 |     2   (0)|      3 |00:00:00.01 |       4 |
|*  4 |     INDEX RANGE SCAN          | E_LOW_SORT1 |      1 |      1 |     1   (0)|      3 |00:00:00.01 |       2 |
--------------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("from$_subquery$_002"."rowlimit_$$_rownumber"<=2)
   2 - filter(ROW_NUMBER() OVER ( ORDER BY "EMP"."SYS_NC00009$","EMP"."HIREDATE")<=2)
   4 - access("EMP"."SYS_NC00009$"='clerk')

It’s just one of those silly little details where you can waste a HUGE amount of time (in a complex case) because it never crossed your mind that something that clearly ought to work might need testing for a specific use case – and I’ve lost count of the number of times I’ve been caught out by this type of “not quite finished” anomaly.

Footnote

If you follow the URL to the Oracle-L thread you’ll see that Tanel Poder has supplied a couple of MoS Document Ids discussing the issue and warning of other bugs with virtual column / FBI translation, and has shown an alternative workaround that takes advantage of a hidden parameter.

 

Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles

Yann Neuhaus - Wed, 2019-06-26 10:23

Pour une fois un blog en français parce que concernant un événement Oracle en Français.
Aujourd’hui chez Oracle Suisse à Genève, il y’avait une présentation concernant la sécurité Oracle.
Il s’agissait de se mettre à la place d’un Hacker pour mieux appréhender leurs démarches et ensuite comment protéger les données dans une base de données Oracle.
Comprendre les intentions et moyens utilisés par les Hackers pour parvenir à leurs fins nous aide à mieux les combattre.

La séance était animée par Hakim Loumi – EMEA DB Security PM
Dans un premier temps le conférencier a montré une estimation de l’accroissement des données d’ici 2025. Waw 175 Zb

Et comment ces données pouvaient être sensitives

Impressionnant le nombre d’attaques dans le monde

Hakim a aussi montré pourquoi le Hacking était devenu un de fléaux les plus importants. Dans ce slide ci-dessous pour un investissement de moins de moins de 10$, une identité complète revenait en moyenne à 240$. Quelle rentabilité !!!!!

Evidemment la base de données est une cible principale des attaques

Le conférencier a ensuite présenté les principaux outils fournis par Oracle, principalement nécessitant l’option Advanced Security Option

Data Masking
Data Redaction
Oracle Firewall
Database vault

Et pour terminer sur ce joli slide

Conclusion

L’évenement étatit vraiment intéressant. Le conférencier par des exemples, des anecdotes simples a su capter l’attention du public. On retiendra surtout dans cette présentation que la sécurité des données, n’est pas que l’affaire du DBA. En effet c’est toute une chaines de procédures incluant tout le monde dans l’entreprise.

Cet article Evenement Oracle : Dans la tête d’un Hacker & Comment protéger vos données sensibles est apparu en premier sur Blog dbi services.

Oracle Developer Tools - Do They Still Exist?

Andrejus Baranovski - Wed, 2019-06-26 01:55
People are frustrated about @OracleADF @JDeveloper on social media - "ADF boat has no captain", etc. I agree @Oracle is to blame big time for such lame handling of its own Developer Tools stack. @Oracle please wake up and spend some budget on @OracleADF. Read more:

Oracle VBCS - right now this tool gets the most of Oracle focus. Supposed to offer declarative #JavaScript development experience in the Cloud. Not well received by the community. Are there any VBCS customers, please respond if yes?

Oracle APEX - comes with a very strong community (mostly backed by DB folks). But is not strategic for Oracle. More likely will be used by PL/SQL guys then by Java or Web developers. 

Oracle JET - highly promoted by Oracle. Set of opensource #JavaScript libs, glued by Oracle layer. Nice, but can't be used as a direct replacement for @OracleADF, JET is UI layer only. Oracle folks often confuse community by saying - Oracle JET is a great option to replace ADF

Oracle Forms - still alive, but obviously can't be strategic Oracle platform. A few years ago, Oracle was promoting Forms modernization to @OracleADF

Summary - Oracle Developer tools offering is weak. Lack of Oracle investment into development tools - makes Oracle developers community shrink.

opt_estimate 2

Jonathan Lewis - Tue, 2019-06-25 14:22

This is a note that was supposed to be a follow-up to an initial example of using the opt_estimate() hint to manipulate the optimizer’s statistical understanding of how much data it would access and (implicitly) how much difference that would make to the resource usage. Instead, two years later, here’s part two – on using opt_estimate() with nested loop joins. As usual I’ll start with a little data set:


rem
rem     Script:         opt_est_nlj.sql
rem     Author:         Jonathan Lewis
rem     Dated:          Aug 2017
rem

create table t1
as
select 
        trunc((rownum-1)/15)    n1,
        trunc((rownum-1)/15)    n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create table t2
pctfree 75
as
select 
        mod(rownum,200)         n1,
        mod(rownum,200)         n2,
        rpad(rownum,180)        v1
from    dual
connect by
        level <= 3000 --> hint to avoid wordpress format issue
;

create index t1_i1 on t1(n1);
create index t2_i1 on t2(n1);

There are 3,000 rows in each table, with 200 distinct values for each of columns n1 and n2. There is an important difference between the tables, though, as the rows for a given value are well clustered in t1 and widely scattered in t2. I’m going to execute a join query between the two tables, ultimately forcing a very bad access path so that I can show some opt_estimate() hints making a difference to cost and cardinality calculations. Here’s my starting query, with execution plan, unhinted (apart from the query block name hint):

select
        /*+ qb_name(main) */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    44   (3)| 00:00:01 |
|*  1 |  HASH JOIN                           |       |   225 | 83700 |    44   (3)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS FULL                  | T2    |  3000 |   541K|    42   (3)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T2"."N1"="T1"."N2")
   3 - access("T1"."N1"=15)

You’ll notice the tablescan and hash join with t2 as the probe (2nd) table and a total cost of 44, which largely due to the tablescan cost of t2 (which I had deliberately defined with pctfree 75 to make the tablescan a little expensive). Let’s hint the query to do a nested loop from t1 to t2 to see why the hash join is preferred over the nested loop:


alter session set "_nlj_batching_enabled"=0;

select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |   242   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |    16   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |    15 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve done two slightly odd things here – I’ve set a hidden parameter to disable nlj batching and I’ve used a hint to block nlj prefetching. This doesn’t affect the arithmetic but it does mean the appearance of the nested loop goes back to the original pre-9i form that happens to make it a little easier to see costs and cardinalities adding and multiplying their way through the plan.

As you can see, the total cost is 242 with this plan and most of the cost is due to the indexed access into t2: the optimizer has correctly estimated that each probe of t2 will acquire 15 rows and that those 15 rows will be scattered across 15 blocks, so the join cardinality comes to 15*15 = 255 and the cost comes to 15 (t1 rows) * 16 (t2 unit cost) + 2 (t1 cost) = 242.

So let’s tell the optimizer that its estimated cardinality for the index range scan is wrong.


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |   225 | 83700 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |    15 |  2775 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06).

The form is: (@qb_name nlj_index_scan, table_alias (list of possible driving tables), target_index, numeric_adjustment).

The numeric_adjustment could be rows=nnn or, as I have here, scale_rows=nnn; the target_index has to be specified by name rather than list of columns, and the list of possible driving tables should be a comma-separated list of fully-qualified table aliases. There’s a similar nlj_index_filter option which I can’t demonstrate in this post because it probably needs an index of at least two-columns before it can be used.

The things to note in this plan are: the index range scan at operation 5 has now has a cardinality (Rows) estimate of 1 (that’s 0.06 * the original 15). This hasn’t changed the cost of the range scan (because that cost was already one before we applied the opt_estimate() hint) but, because the cost of the table access is dependent on the index selectivity the cost of the table access is down to 2 (from 16). On the other hand the table cardinality hasn’t dropped so now it’s not consistent with the number of rowids predicted by the index range scan. The total cost of the query has dropped to 32, though, which is 15 (t1 rows) * 2 (t2 unit cost) + 2 (t1 cost).

Let’s try to adjust the predication that the optimizer makes about the number of rows we fetch from the table. Rather than going all the way to being consistent with the index range scan I’ll dictate a scaling factor that will make it easy to see the effect – let’s tell the optimizer that we will get one-fifth of the originally expected rows (i.e. 3).


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    47 | 17484 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     3 |   555 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

By adding the hint opt_estimate(@main table, t2@main, scale_rows=0.20) we’ve told the optimizer that it should scale the estimated row count down by a factor of 5 from whatever it calculates. Bear in mind that in a more complex query the optimizer might decode to follow the path we expected and that factor of 0.2 will be applied whenever t2 is accessed. Notice in this plan that the join cardinality in operation 1 has also dropped from 225 to 47 – if the optimizer is told that it’s cardinality (or selectivity) calculation is wrong for the table the numbers involved in the selectivity will carry on through the plan, producing a different “adjusted NDV” for the join cardinality calculation.

Notice, though, that the total cost of the query has not changed. The cost was dictated by the optimizer’s estimate of the number of table blocks to be visited after the index range scan. The estimated number of table blocks hasn’t changed, it’s just the number of rows we will find there that we’re now hacking.

Just for completion, let’s make one final change (again, something that might be necessary in a more complex query), let’s fix the join cardinality:


select
        /*+
                qb_name(main)
                leading(t1 t2)
                use_nl(t2)
                index(t2)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

I’ve used the hint opt_estimate(@main join(t2 t1), scale_rows=0.5) to tell the optimizer to halve its estimate of the join cardinality between t1 and t2 (whatever order they appear in). With the previous hints in place the estimate had dropped to 47 (which must have been 46 and a large bit), with this final hint it has now dropped to 23. Interestingly the cardinality estimate for the table access to t2 has dropped at the same time (almost as if the optimizer has “rationalised” the join cardinality by adjusting the selectivity of the second table in the join – that’s something I may play around with in the future, but it may require reading a 10053 trace, which I tend to avoid doing).

Side not: If you have access to MoS you’ll find that Doc ID: 2402821.1 “How To Use Optimizer Hints To Specify Cardinality For Join Operation”, seems to suggest that the cardinality() hint is something to use for single table cardinalities, and implies that the opt_estimate(join) option is for two-table joins. In fact both hints can be used to set the cardinality of multi-table joins.

Finally, then, let’s eliminate the hints that force the join order and join method and see what happens to our query plan if all we include is the opt_estimate() hints (and the qb_name() and no_nlj_prefetch hints).

select
        /*+
                qb_name(main)
                no_nlj_prefetch(t2)
                opt_estimate(@main nlj_index_scan, t2@main (t1), t2_i1, scale_rows=0.06)
                opt_estimate(@main table         , t2@main     ,        scale_rows=0.20)
                opt_estimate(@main join(t2 t1)   ,                      scale_rows=0.5)
        */
        t1.v1, t2.v1
from    t1, t2
where
        t1.n1 = 15
and     t2.n1 = t1.n2
;

----------------------------------------------------------------------------------------------
| Id  | Operation                            | Name  | Rows  | Bytes | Cost (%CPU)| Time     |
----------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   1 |  NESTED LOOPS                        |       |    23 |  8556 |    32   (0)| 00:00:01 |
|   2 |   TABLE ACCESS BY INDEX ROWID BATCHED| T1    |    15 |  2805 |     2   (0)| 00:00:01 |
|*  3 |    INDEX RANGE SCAN                  | T1_I1 |    15 |       |     1   (0)| 00:00:01 |
|   4 |   TABLE ACCESS BY INDEX ROWID BATCHED| T2    |     2 |   370 |     2   (0)| 00:00:01 |
|*  5 |    INDEX RANGE SCAN                  | T2_I1 |     1 |       |     1   (0)| 00:00:01 |
----------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   3 - access("T1"."N1"=15)
   5 - access("T2"."N1"="T1"."N2")

Note
-----
   - this is an adaptive plan

WIth a little engineering on the optimizer estimates we’ve managed to con Oracle into using a different path from the default choice. Do notice, though, the closing Note section (which didn’t appear in all the other examples): I’ve left Oracle with the option of checking the actual stats as the query runs, so if I run the query twice Oracle might spot that the arithmetic is all wrong and throw in some SQL Plan Directives – which are just another load of opt_estimate() hints.

In fact, in this example, the plan we wanted because desirable as soon as we applied the nlj_ind_scan fix-up as this made the estimated cost of the index probe into t2 sufficiently low (even though it left an inconsistent cardinality figure for the table rows) that Oracle would have switched from the default hash join to the nested loop on that basis alone.

Closing Comment

As I pointed out in the previous article, this is just scratching the surface of how the opt_estimate() hint works, and even with very simple queries it can be hard to tell whether any behaviour we’ve seen is actually doing what we think it’s doing. In a third article I’ll be looking at something prompted by the most recent email I’ve had about opt_estimate() – how it might (or might not) behave in the presence of inline views and transformations like merging or pushing predicates. I’ll try not to take 2 years to publish it.

 

SQLcl ALIAS – because you can’t remember everything.

The Anti-Kyte - Tue, 2019-06-25 08:47

I want to find out which file is going to hold any trace information generated by my database session. Unfortunately, I keep forgetting the query that I need to run to find out.
Fortunately I’m using SQLcl, which includes the ALIAS command.
What follows is a quick run-through of this command including :

  • listing the aliases that are already set up in SQLcl
  • displaying the code that an alias will execute
  • creating your own alias interactively
  • deleting an alias
  • using files to manage custom aliases

Whilst I’m at it, I’ll create the alias for the code to find that pesky trace file too.

In the examples that follow, I’m connected to an Oracle XE18c PDB using SQLcl 18.4 from my Ubuntu 16.4 LTS laptop via the Oracle Thin Client. Oh, and the Java details are :

Meet the ALIAS command

As so often in SQLcl, it’s probably a good idea to start with the help :

help alias

…which explains that :

“Alias is a command which allows you to save a sql, plsql or sqlplus script and assign it a shortcut command.”

A number of aliases are already included in SQLcl. To get a list of them simply type :

alias

…which returns :

locks
sessions
tables
tables2

If we want to see the code that will be run when an alias is invoked, we simply need to list the alias :

alias list tables

tables - tables <schema> - show tables from schema
--------------------------------------------------

select table_name "TABLES" from user_tables

Connected as HR, I can run the alias to return a list of tables that I own in the database :

Creating an ALIAS

To create an alias of my own, I simply need to specify the alias name and the statement I want to associate it with. For example, to create an alias called whoami :

alias whoami =
select sys_context('userenv', 'session_user')
from dual;

I can now confirm that the alias has been created :

alias list whoami
whoami
------

select sys_context('userenv', 'session_user')
from dual

…and run it…

I think I want to tidy up that column heading. I could do this by adding an alias in the query itself. However, alias does support the use of SQL*Plus commands…

alias whoami =
column session_user format a30
select sys_context('userenv', 'session_user') session_user
from dual;

…which can make the output look slightly more elegant :

A point to note here is that, whilst it is possible to include SQL*Plus statements in an alias for a PL/SQL block (well, sort of)…

alias whoami=set serverout on
exec dbms_output.put_line(sys_context('userenv', 'session_user'));

…when the alias starts with a SQL*Plus statement, it will terminate at the first semi-colon…

Where you do have a PL/SQL alias that contains multiple statement terminators (‘;’) you will need to run any SQL*Plus commands required prior to invoking it.
Of course, if you find setting output on to be a bit onerous, you can save valuable typing molecules by simply running :

alias output_on = set serverout on size unlimited

I can also add a description to my alias so that there is some documentation when it’s listed :

alias desc whoami The current session user

When I now list the alias, the description is included…more-or-less…

I’m not sure if the inclusion of the text desc whoami is simply a quirk of the version and os that I’m running on. In any case, we’ll come to a workaround for this minor annoyance in due course.

In the meantime, I’ve decided that I don’t need this alias anymore. To remove it, I simply need to run the alias drop command :

alias drop whoami


At this point, I know enough about the alias command to implement my first version of the session tracefile alias that started all this.
The query, that I keep forgetting, is :

select value
from v$diag_info
where name = 'Default Trace File'
/

To create the new alias :

alias tracefile =
select value "Session Trace File"
from v$diag_info
where name = 'Default Trace File';

I’ll also add a comment at this point :

alias desc tracefile The full path and filename on the database server of the tracefile for this session

My new alias looks like this :

The aliases.xml file

Unlike the pre-supplied aliases, the code for any alias you create will be held in a file called aliases.xml.

On Windows, this file will probably be somewhere under your OS user’s AppData directory.
On Ubuntu, it’s in $HOME/.sqlcl

With no custom aliases defined the file looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases/>

Note that, even though I have now defined a custom alias, it won’t be included in this file until I end the SQLcl session in which it was created.

Once I disconnect from this session, the file includes the new alias definition :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases>
<alias name="tracefile">
<description><![CDATA[desc tracefile The full path and filename on the database server of the tracefile for this session
]]></description>
<queries>
<query>
<sql><![CDATA[select value "Session Trace File"
from v$diag_info
where name = 'Default Trace File']]></sql>
</query>
</queries>
</alias>
</aliases>

Incidentally, if you’ve played around with SQLDeveloper extensions, you may find this file structure rather familiar.

The file appears to be read by SQLcl once on startup. Therefore, before I run SQLcl again, I can tweak the description of my alias to remove the extraneous text…

<description><![CDATA[The full path and filename on the database server of the tracefile for this session]]></description>

Sure enough, next time I start an SQLcl session, this change is now reflected in the alias definition :

Loading an alias from a file

The structure of the aliases.xml file gives us a template we can use to define an alias in the comfort of a text editor rather than on the command line. For example, we have the following PL/SQL block, which reads a bind variable :

declare
v_msg varchar2(100);
begin
if upper(:mood) = 'BAD' then
if to_char(sysdate, 'DAY') != 'MONDAY' then
v_msg := q'[At least it's not Monday!]';
elsif to_number(to_char(sysdate, 'HH24MI')) > 1200 then
v_msg := q'[At least it's not Monday morning!]';
else
v_msg := q'[I'm not surprised. It's Monday morning !]';
end if;
elsif upper(:mood) = 'GOOD' then
v_msg := q'[Don't tell me West Ham actually won ?!]';
else
v_msg := q'[I'm just a simple PL/SQL block and I can't handle complex emotions, OK ?!]';
end if;
dbms_output.new_line;
dbms_output.put_line(v_msg);
end;
/

Rather than typing this in on the command line, we can create a file ( called pep_talk.xml) which looks like this :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<aliases>
<alias name="pep_talk">
<description><![CDATA[How are you feeling ? Usage is pep_talk <emotion>]]></description>
<queries>
<query>
<sql><![CDATA[
declare
v_msg varchar2(100);
begin
if upper(:mood) = 'BAD' then
if to_char(sysdate, 'DAY') != 'MONDAY' then
v_msg := q'[At least it's not Monday!]';
elsif to_number(to_char(sysdate, 'HH24MI')) > 1200 then
v_msg := q'[At least it's not Monday morning!]';
else
v_msg := q'[I'm not surprised. It's Monday morning !]';
end if;
elsif upper(:mood) = 'GOOD' then
v_msg := q'[Don't tell me West Ham actually won ?!]';
else
v_msg := q'[I'm just a simple PL/SQL block and I can't handle complex emotions, OK ?!]';
end if;
dbms_output.new_line;
dbms_output.put_line(v_msg);
end;
]]></sql>
</query>
</queries>
</alias>
</aliases>

Now, we can load this alias from the file as follows :

alias load pep_talk.xml
Aliases loaded

We can now execute our new alias. First though, we need to remember to turn serveroutput on before we invoke it :

Once you’ve terminated your SQLcl session, the new alias will be written to aliases.xml.

Exporting custom aliases

There may come a time when you want to share your custom aliases with your colleagues. After all, it’s always useful to know where the trace file is and who doesn’t need a pep talk from time-to-time ?

To “export” your aliases, you can issue the following command from SQLcl :

alias save mike_aliases.xml

This writes the file to the same location as your aliases.xml :

You can then import these aliases to another SQLcl installation simply by sharing the file and then using the alias load command.

References

As you can imagine, there are a wide variety of possible uses for the ALIAS command.

As the original author of this feature, this post by Kris Rice is probably worth a read.
Jeff Smith has written on this topic several times including :

Menno Hoogendijk has an example which employs some Javascript wizardry which he has published on GitHub.

Right, back to my trace files.

ANSI bug

Jonathan Lewis - Sat, 2019-06-22 07:01

The following note is about a script that I found on my laptop while I was searching for some details about a bug that appears when you write SQL using the ANSI style format rather than traditional Oracle style. The script is clearly one that I must have cut and pasted from somewhere (possibly the OTN/ODC database forum) many years ago without making any notes about its source or resolution. All I can say about it is that the file has a creation date of July 2012 and I can’t find any reference to a problem through Google searches – though the tables and even a set of specific insert statements appears in a number of pages that look like coursework for computer studies and MoS has a similar looking bug “fixed in 11.2”.

Here’s the entire script:

rem
rem     Script:         ansi_bug.sql
rem     Author:         ???
rem     Dated:          July 2012
rem

CREATE TABLE Student (
  sid INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  address VARCHAR(20) NOT NULL,
  major CHAR(2)
);

CREATE TABLE Professor (
  pid INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  department VARCHAR(10) NOT NULL
);

CREATE TABLE Course (
  cid INT PRIMARY KEY,
  title VARCHAR(20) NOT NULL UNIQUE,
  credits INT NOT NULL,
  area VARCHAR(5) NOT NULL
);

CREATE TABLE Transcript (
  sid INT,
  cid INT,
  pid INT,
  semester VARCHAR(9),
  year CHAR(4),
  grade CHAR(1) NOT NULL,
  PRIMARY KEY (sid, cid, pid, semester, year),
  FOREIGN KEY (sid) REFERENCES Student (sid),
  FOREIGN KEY (cid) REFERENCES Course (cid),
  FOREIGN KEY (pid) REFERENCES Professor (pid)
);

INSERT INTO Student (sid, name, address, major) VALUES (101, 'Nathan', 'Edinburg', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (105, 'Hussein', 'Edinburg', 'IT');
INSERT INTO Student (sid, name, address, major) VALUES (103, 'Jose', 'McAllen', 'CE');
INSERT INTO Student (sid, name, address, major) VALUES (102, 'Wendy', 'Mission', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (104, 'Maria', 'Pharr', 'CS');
INSERT INTO Student (sid, name, address, major) VALUES (106, 'Mike', 'Edinburg', 'CE');
INSERT INTO Student (sid, name, address, major) VALUES (107, 'Lily', 'McAllen', NULL);

INSERT INTO Professor (pid, name, department) VALUES (201, 'Artem', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (203, 'John', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (202, 'Virgil', 'MATH');
INSERT INTO Professor (pid, name, department) VALUES (204, 'Pearl', 'CS');
INSERT INTO Professor (pid, name, department) VALUES (205, 'Christine', 'CS');

INSERT INTO Course (cid, title, credits, area) VALUES (4333, 'Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (1201, 'Comp literacy', 2, 'INTRO');
INSERT INTO Course (cid, title, credits, area) VALUES (6333, 'Advanced Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (6315, 'Applied Database', 3, 'DB');
INSERT INTO Course (cid, title, credits, area) VALUES (3326, 'Java', 3, 'PL');
INSERT INTO Course (cid, title, credits, area) VALUES (1370, 'CS I', 4, 'INTRO');

INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 4333, 201, 'Spring', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 6333, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (101, 6315, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (103, 4333, 203, 'Summer I', '2010', 'B');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (102, 4333, 201, 'Fall', '2009', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (103, 3326, 204, 'Spring', '2008', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (104, 1201, 205, 'Fall', '2009', 'B');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (104, 1370, 203, 'Summer II', '2010', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (106, 1201, 205, 'Fall', '2009', 'C');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (106, 1370, 203, 'Summer II', '2010', 'C');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (105, 3326, 204, 'Spring', '2001', 'A');
INSERT INTO Transcript (sid, cid, pid, semester, year, grade) VALUES (105, 6315, 203, 'Fall', '2008', 'A');

SELECT 
        pid, 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
;

SELECT 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
;

SELECT 
        name, title
FROM 
        Professor 
NATURAL LEFT OUTER JOIN 
        (
                Transcript 
        NATURAL JOIN 
                Course
        )
order by pid
;

I’ve run three minor variations of the same query – the one in the middle selects two columns from a three table join using natural joins. The first query does the same but includes an extra column in the select list while the third query selects only the original columns but orders the result set by the extra column.

The middle query returns 60 rows – the first and third, with the “extra” column projected somewhere in the execution plan, return 13 rows.

I didn’t even have a note of the then-current version of Oracle when I copied this script, but I’ve just run it on 12.2.0.1, 18.3.0.0, and 19.2.0.0 (using LiveSQL), and the error reproduces on all three versions.

Ubuntu Server: How to activate kernel dumps

Dietrich Schroff - Fri, 2019-06-21 14:25
If you are running ubuntu server, you can add kdump on your system to write kernel dumps in case of sudden reboots etc.

Installing is very easy:
root@ubuntuserver:/etc# apt install linux-crashdump
Reading package lists... Done
Building dependency tree      
Reading state information... Done
The following additional packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu crash kdump-tools kexec-tools libbinutils libdw1 libsnappy1v5 makedumpfile
Suggested packages:
  binutils-doc
The following NEW packages will be installed:
  binutils binutils-common binutils-x86-64-linux-gnu crash kdump-tools kexec-tools libbinutils libdw1 libsnappy1v5 linux-crashdump makedumpfile
0 upgraded, 11 newly installed, 0 to remove and 43 not upgraded.
Need to get 2,636 B/5,774 kB of archives.
After this operation, 26.0 MB of additional disk space will be used.
Do you want to continue? [Y/n]
Get:1 http://archive.ubuntu.com/ubuntu bionic-updates/main amd64 linux-crashdump amd64 4.15.0.46.48 [2,636 B]
Fetched 2,636 B in 0s (28.1 kB/s)    
Preconfiguring packages ...
Selecting previously unselected package binutils-common:amd64.
(Reading database ... 66831 files and directories currently installed.)
Preparing to unpack .../00-binutils-common_2.30-21ubuntu1~18.04_amd64.deb ...
Unpacking binutils-common:amd64 (2.30-21ubuntu1~18.04) ...
Selecting previously unselected package libbinutils:amd64.
Preparing to unpack .../01-libbinutils_2.30-21ubuntu1~18.04_amd64.deb ...
Unpacking libbinutils:amd64 (2.30-21ubuntu1~18.04) ...
Selecting previously unselected package binutils-x86-64-linux-gnu.
Preparing to unpack .../02-binutils-x86-64-linux-gnu_2.30-21ubuntu1~18.04_amd64.deb ...
Unpacking binutils-x86-64-linux-gnu (2.30-21ubuntu1~18.04) ...
Selecting previously unselected package binutils.
Preparing to unpack .../03-binutils_2.30-21ubuntu1~18.04_amd64.deb ...
Unpacking binutils (2.30-21ubuntu1~18.04) ...
Selecting previously unselected package libsnappy1v5:amd64.
Preparing to unpack .../04-libsnappy1v5_1.1.7-1_amd64.deb ...
Unpacking libsnappy1v5:amd64 (1.1.7-1) ...
Selecting previously unselected package crash.
Preparing to unpack .../05-crash_7.2.1-1ubuntu2_amd64.deb ...
Unpacking crash (7.2.1-1ubuntu2) ...
Selecting previously unselected package kexec-tools.
Preparing to unpack .../06-kexec-tools_1%3a2.0.16-1ubuntu1_amd64.deb ...
Unpacking kexec-tools (1:2.0.16-1ubuntu1) ...
Selecting previously unselected package libdw1:amd64.
Preparing to unpack .../07-libdw1_0.170-0.4_amd64.deb ...
Unpacking libdw1:amd64 (0.170-0.4) ...
Selecting previously unselected package makedumpfile.
Preparing to unpack .../08-makedumpfile_1%3a1.6.3-2_amd64.deb ...
Unpacking makedumpfile (1:1.6.3-2) ...
Selecting previously unselected package kdump-tools.
Preparing to unpack .../09-kdump-tools_1%3a1.6.3-2_amd64.deb ...
Unpacking kdump-tools (1:1.6.3-2) ...
Selecting previously unselected package linux-crashdump.
Preparing to unpack .../10-linux-crashdump_4.15.0.46.48_amd64.deb ...
Unpacking linux-crashdump (4.15.0.46.48) ...
Processing triggers for ureadahead (0.100.0-20) ...
Setting up libdw1:amd64 (0.170-0.4) ...
Setting up kexec-tools (1:2.0.16-1ubuntu1) ...
Generating /etc/default/kexec...
Setting up binutils-common:amd64 (2.30-21ubuntu1~18.04) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Setting up makedumpfile (1:1.6.3-2) ...
Setting up libsnappy1v5:amd64 (1.1.7-1) ...
Processing triggers for systemd (237-3ubuntu10.12) ...
Processing triggers for man-db (2.8.3-2ubuntu0.1) ...
Setting up libbinutils:amd64 (2.30-21ubuntu1~18.04) ...
Setting up kdump-tools (1:1.6.3-2) ...

Creating config file /etc/default/kdump-tools with new version
Sourcing file `/etc/default/grub'
Sourcing file `/etc/default/grub.d/50-curtin-settings.cfg'
Sourcing file `/etc/default/grub.d/kdump-tools.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-4.15.0-45-generic
Found initrd image: /boot/initrd.img-4.15.0-45-generic
done
Created symlink /etc/systemd/system/multi-user.target.wants/kdump-tools.service → /lib/systemd/system/kdump-tools.service.
Setting up linux-crashdump (4.15.0.46.48) ...
Setting up binutils-x86-64-linux-gnu (2.30-21ubuntu1~18.04) ...
Setting up binutils (2.30-21ubuntu1~18.04) ...
Setting up crash (7.2.1-1ubuntu2) ...
Processing triggers for ureadahead (0.100.0-20) ...
Processing triggers for libc-bin (2.27-3ubuntu1) ...
Processing triggers for systemd (237-3ubuntu10.12) ...
Within the installation you have to answer these questions:


After the installation the following parameter is added to the kernel cmdline:
grep -r crash /boot* |grep cfg
/boot/grub/grub.cfg:        linux    /boot/vmlinuz-4.15.0-46-generic root=UUID=a83c2a94-91c4-461a-b6a4-c7a81422a857 ro  maybe-ubiquity crashkernel=384M-:128M
/boot/grub/grub.cfg:            linux    /boot/vmlinuz-4.15.0-46-generic root=UUID=a83c2a94-91c4-461a-b6a4-c7a81422a857 ro  maybe-ubiquity crashkernel=384M-:128M
with
crashkernel=:[,:,...][@offset]
range=start-[end] 'start' is inclusive and 'end' is exclusive

The configuration is done via /etc/default/kdump-tools. Here the parameter to control the directory to dump the core into:

cat /etc/default/kdump-tools  |grep DIR
# KDUMP_COREDIR - local path to save the vmcore to.
KDUMP_COREDIR="/var/crash"
Next step is to reboot and verify the kernel cmdline.

#cat /proc/cmdline 
BOOT_IMAGE=/boot/vmlinuz-4.15.0-46-generic root=UUID=a83c2a94-91c4-461a-b6a4-c7a81422a857 ro maybe-ubiquity crashkernel=384M-:128M


To get a coredump just use the following commands:
root@ubuntuserver:/etc# sysctl -w kernel.sysrq=1
kernel.sysrq = 1
root@ubuntuserver:/etc# echo c > /proc/sysrq-trigger

Contextual Targeting vs Behavioral Targeting

VitalSoftTech - Tue, 2019-06-18 12:19

Let’s suppose these are the olden times and you have to advertise for a new circus in town. Do you paste the posters on the walls of places of entertainment like a movie theater, a bar, horse racing tracks, or a casino? Or do you spend a little time about town and look around for […]

The post Contextual Targeting vs Behavioral Targeting appeared first on VitalSoftTech.

Categories: DBA Blogs

PostgreSQL partitioning (8): Sub-partitioning

Yann Neuhaus - Tue, 2019-06-18 04:19

We are slowly coming to the end of this little series about partitioning in PostgreSQL. In the last post we had a look at indexing and constraints and today we will have a look at sub partitioning. Sub partitioning means you go one step further and partition the partitions as well. Although it is not required to read all the posts of this series to follow this one: If you want, here they are:

  1. PostgreSQL partitioning (1): Preparing the data set
  2. PostgreSQL partitioning (2): Range partitioning
  3. PostgreSQL partitioning (3): List partitioning
  4. PostgreSQL partitioning (4) : Hash partitioning
  5. PostgreSQL partitioning (5): Partition pruning
  6. PostgreSQL partitioning (6): Attaching and detaching partitions
  7. PostgreSQL partitioning (7): Indexing and constraints

Coming back to our range partitioned table this is how it looks like currently:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_default DEFAULT

Lets assume that you expect that traffic violations will grow exponentially in 2022 because more and more cars will be on the road and when there will be more cars there will be more traffic violations. To be prepared for that you do not only want to partition by year but also by month. In other words: Add a new partition for 2022 but sub partition that by month. First of all you need a new partition for 2022 that itself is partitioned as well:

create table traffic_violations_p_2022
partition of traffic_violations_p
for values from ('2022-01-01') to ('2023-01-01') partition by range(date_of_stop);

Now we can add partitions to the just created partitioned partition:

create table traffic_violations_p_2022_jan
partition of traffic_violations_p_2022
for values from ('2022-01-01') to ('2022-02-01');

create table traffic_violations_p_2022_feb
partition of traffic_violations_p_2022
for values from ('2022-02-01') to ('2022-03-01');

create table traffic_violations_p_2022_mar
partition of traffic_violations_p_2022
for values from ('2022-03-01') to ('2022-04-01');

create table traffic_violations_p_2022_apr
partition of traffic_violations_p_2022
for values from ('2022-04-01') to ('2022-05-01');

create table traffic_violations_p_2022_may
partition of traffic_violations_p_2022
for values from ('2022-05-01') to ('2022-06-01');

create table traffic_violations_p_2022_jun
partition of traffic_violations_p_2022
for values from ('2022-06-01') to ('2022-07-01');

create table traffic_violations_p_2022_jul
partition of traffic_violations_p_2022
for values from ('2022-07-01') to ('2022-08-01');

create table traffic_violations_p_2022_aug
partition of traffic_violations_p_2022
for values from ('2022-08-01') to ('2022-09-01');

create table traffic_violations_p_2022_sep
partition of traffic_violations_p_2022
for values from ('2022-09-01') to ('2022-10-01');

create table traffic_violations_p_2022_oct
partition of traffic_violations_p_2022
for values from ('2022-10-01') to ('2022-11-01');

create table traffic_violations_p_2022_nov
partition of traffic_violations_p_2022
for values from ('2022-11-01') to ('2022-12-01');

create table traffic_violations_p_2022_dec
partition of traffic_violations_p_2022
for values from ('2022-12-01') to ('2023-01-01');

Looking at psql’s output when we describe the partitioned table not very much changed, just the keyword “PARTITIONED” is showing up beside our new partition for 2022:

postgres=# \d+ traffic_violations_p
                                      Partitioned table "public.traffic_violations_p"
         Column          |          Type          | Collation | Nullable | Default | Storage  | Stats target | Description 
-------------------------+------------------------+-----------+----------+---------+----------+--------------+-------------
 seqid                   | text                   |           |          |         | extended |              | 
 date_of_stop            | date                   |           |          |         | plain    |              | 
 time_of_stop            | time without time zone |           |          |         | plain    |              | 
 agency                  | text                   |           |          |         | extended |              | 
 subagency               | text                   |           |          |         | extended |              | 
 description             | text                   |           |          |         | extended |              | 
 location                | text                   |           |          |         | extended |              | 
 latitude                | numeric                |           |          |         | main     |              | 
 longitude               | numeric                |           |          |         | main     |              | 
 accident                | text                   |           |          |         | extended |              | 
 belts                   | boolean                |           |          |         | plain    |              | 
 personal_injury         | boolean                |           |          |         | plain    |              | 
 property_damage         | boolean                |           |          |         | plain    |              | 
 fatal                   | boolean                |           |          |         | plain    |              | 
 commercial_license      | boolean                |           |          |         | plain    |              | 
 hazmat                  | boolean                |           |          |         | plain    |              | 
 commercial_vehicle      | boolean                |           |          |         | plain    |              | 
 alcohol                 | boolean                |           |          |         | plain    |              | 
 workzone                | boolean                |           |          |         | plain    |              | 
 state                   | text                   |           |          |         | extended |              | 
 vehicletype             | text                   |           |          |         | extended |              | 
 year                    | smallint               |           |          |         | plain    |              | 
 make                    | text                   |           |          |         | extended |              | 
 model                   | text                   |           |          |         | extended |              | 
 color                   | text                   |           |          |         | extended |              | 
 violation_type          | text                   |           |          |         | extended |              | 
 charge                  | text                   |           |          |         | extended |              | 
 article                 | text                   |           |          |         | extended |              | 
 contributed_to_accident | boolean                |           |          |         | plain    |              | 
 race                    | text                   |           |          |         | extended |              | 
 gender                  | text                   |           |          |         | extended |              | 
 driver_city             | text                   |           |          |         | extended |              | 
 driver_state            | text                   |           |          |         | extended |              | 
 dl_state                | text                   |           |          |         | extended |              | 
 arrest_type             | text                   |           |          |         | extended |              | 
 geolocation             | point                  |           |          |         | plain    |              | 
 council_districts       | smallint               |           |          |         | plain    |              | 
 councils                | smallint               |           |          |         | plain    |              | 
 communities             | smallint               |           |          |         | plain    |              | 
 zip_codes               | smallint               |           |          |         | plain    |              | 
 municipalities          | smallint               |           |          |         | plain    |              | 
Partition key: RANGE (date_of_stop)
Partitions: traffic_violations_p_2013 FOR VALUES FROM ('2013-01-01') TO ('2014-01-01'),
            traffic_violations_p_2014 FOR VALUES FROM ('2014-01-01') TO ('2015-01-01'),
            traffic_violations_p_2015 FOR VALUES FROM ('2015-01-01') TO ('2016-01-01'),
            traffic_violations_p_2016 FOR VALUES FROM ('2016-01-01') TO ('2017-01-01'),
            traffic_violations_p_2017 FOR VALUES FROM ('2017-01-01') TO ('2018-01-01'),
            traffic_violations_p_2018 FOR VALUES FROM ('2018-01-01') TO ('2019-01-01'),
            traffic_violations_p_2019 FOR VALUES FROM ('2019-01-01') TO ('2020-01-01'),
            traffic_violations_p_2020 FOR VALUES FROM ('2020-01-01') TO ('2021-01-01'),
            traffic_violations_p_2021 FOR VALUES FROM ('2021-01-01') TO ('2022-01-01'),
            traffic_violations_p_2022 FOR VALUES FROM ('2022-01-01') TO ('2023-01-01'), PARTITIONED,
            traffic_violations_p_default DEFAULT

The is where the new functions in PostgreSQL 12 become very handy:

postgres=# select * from pg_partition_tree('traffic_violations_p');
             relid             |        parentrelid        | isleaf | level 
-------------------------------+---------------------------+--------+-------
 traffic_violations_p          |                           | f      |     0
 traffic_violations_p_default  | traffic_violations_p      | t      |     1
 traffic_violations_p_2013     | traffic_violations_p      | t      |     1
 traffic_violations_p_2014     | traffic_violations_p      | t      |     1
 traffic_violations_p_2015     | traffic_violations_p      | t      |     1
 traffic_violations_p_2016     | traffic_violations_p      | t      |     1
 traffic_violations_p_2017     | traffic_violations_p      | t      |     1
 traffic_violations_p_2018     | traffic_violations_p      | t      |     1
 traffic_violations_p_2019     | traffic_violations_p      | t      |     1
 traffic_violations_p_2020     | traffic_violations_p      | t      |     1
 traffic_violations_p_2021     | traffic_violations_p      | t      |     1
 traffic_violations_p_2022     | traffic_violations_p      | f      |     1
 traffic_violations_p_2022_jan | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_feb | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_mar | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_apr | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_may | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_jun | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_jul | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_aug | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_sep | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_oct | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_nov | traffic_violations_p_2022 | t      |     2
 traffic_violations_p_2022_dec | traffic_violations_p_2022 | t      |     2

To verify if data is routed correctly to the sub partitions let’s add some data for 2022:

insert into traffic_violations_p (date_of_stop)
       select * from generate_series ( date('01-01-2022')
                                     , date('12-31-2022')
                                     , interval '1 day' );

If we did the partitioning correctly we should see data in the new partitions:

postgres=# select count(*) from traffic_violations_p_2022_nov;
 count 
-------
    30
(1 row)

postgres=# select count(*) from traffic_violations_p_2022_dec;
 count 
-------
    31
(1 row)

postgres=# select count(*) from traffic_violations_p_2022_feb;
 count 
-------
    28
(1 row)

Here we go. Of course you could go even further and sub-partition the monthly partitions further by day or week. You can also partition by list and then sub-partition the list partitions by range. Or partition by range and then sub-partition by list, e.g.:

postgres=# create table traffic_violations_p_list_dummy partition of traffic_violations_p_list for values in ('dummy') partition by range (date_of_stop);
CREATE TABLE
postgres=# create table traffic_violations_p_list_dummy_2019 partition of traffic_violations_p_list_dummy for values from ('2022-01-01') to ('2023-01-01');
CREATE TABLE
postgres=# insert into traffic_violations_p_list (seqid, violation_type , date_of_stop) values (-1,'dummy',date('2022-12-01'));
INSERT 0 1
postgres=# select date_of_stop,violation_type from traffic_violations_p_list_dummy_2019;
 date_of_stop | violation_type 
--------------+----------------
 2022-12-01   | dummy
(1 row)

That’s it for sub-partitioning. In the final post we will look at some corner cases with partitioning in PostgreSQL.

Cet article PostgreSQL partitioning (8): Sub-partitioning est apparu en premier sur Blog dbi services.

Looking for errors in the Clusterware and RAC logs? Dash through using the TFA Collector

VitalSoftTech - Mon, 2019-06-17 09:49

The Oracle Trace File analyzer utility has been originally developed by Oracle to help collect and bundle up all the pertinent diagnostic data in the log files, tracefiles, os statistics, etc.. This is a very common task when Oracle Support engineers request this information to help troubleshoot issues and bugs.

The post Looking for errors in the Clusterware and RAC logs? Dash through using the TFA Collector appeared first on VitalSoftTech.

Categories: DBA Blogs

Can’t Unnest

Jonathan Lewis - Mon, 2019-06-17 09:35

In an echo of a very old “conditional SQL” posting, a recent posting on the ODC general database discussion forum ran into a few classic errors of trouble-shooting. By a lucky coincidence this allowed me to rediscover and publish an old example of parallel execution gone wild before moving on to talk about the fundamental problem exhibited in the latest query.

The ODC thread started with a question along the lines of “why isn’t Oracle using the index I hinted”, with the minor variation that it said “When I hint my SQL with an index hint it runs quickly so I’ve created a profile that applies the hint, but the hint doesn’t get used in production.”

The query was a bit messy and, as is often the case with ODC, the formatting wasn’t particularly readable, so I’ve extracted the where clause from the SQL that was used to generate the profile and reformatted it below. See if you can spot the hint clue that tells you why there might be a big problem using this SQL to generate a profile to use in the production environment:


WHERE   
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE' 
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS' 
AND     MSG.SRCH_4_FLD_VAL = '123456'   
AND     (
            (    'INVOICENUMBER' = 'INVOICENUMBER' 
             AND MSG.MSG_ID IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('123456') AS TABLE_OF_VARCHAR)))
            ) 
         OR (    'INVOICENUMBER' = 'SIEBELORDERID' 
             AND MSG.SRCH_3_FLD_VAL IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('') AS TABLE_OF_VARCHAR)))
            )
        ) 
AND     MSG.MSG_ID = TRK.INV_NUM(+) 
AND     (   TRK.RESEND_DT IS NULL 
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)   
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1   
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

If the SQL by itself doesn’t give you an inportant clue, compare it with the Predicate Information from the “good” execution plan that it produced:


Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter(("TRK"."RESEND_DT" IS NULL OR "TRK"."RESEND_DT"=))  
   8 - filter(("MSG"."SRCH_4_FLD_VAL"='123456' AND "MSG"."MSG_CAPTR_STG_CD"='PRE_BCS'))  
   9 - access("MSG"."MSG_ID"="COLUMN_VALUE" AND "MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
       filter("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
  10 - access("MSG"."MSG_ID"="TRK"."INV_NUM")  
  13 - access("TRK1"."INV_NUM"=:B1)  

Have you spotted the thing that isn’t there in the predicate information ?

What happened to the ‘INVOICENUMBER’ = ‘INVOICENUMBER’ predicate and the ‘INVOICENUMBER’ = ‘SIEBELORDERID’ predicate? They’ve disappeared because the optimizer knows that the first predicate is always true and doesn’t need to be tested at run-time and the second one is always false and doesn’t need to be tested at run-time. Moreover both predicates are part of a conjunct (AND) – so in the second case the entire two-part predicate can be eliminated; so the original where clause can immediately be reduced to:


WHERE   
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE' 
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS' 
AND     MSG.SRCH_4_FLD_VAL = '123456'   
AND     (
                 MSG.MSG_ID IN (
                        SELECT  *   
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST('123456') AS TABLE_OF_VARCHAR)))
        ) 
AND     MSG.MSG_ID = TRK.INV_NUM(+) 
AND     (   TRK.RESEND_DT IS NULL 
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)   
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1   
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

Looking at this reduced predicate you may note that the IN subquery referencing the fnm_gn_in_string_list() collection could now be unnested and used to drive the final execution plan, and the optimizer will even recognize that it’s a rowsource with at most one row. So here’s the “good” execution plan:


---------------------------------------------------------------------------------------------------------------------------------------------------------------  
| Id  | Operation                               | Name                  | Starts | E-Rows | A-Rows |   A-Time   | Buffers | Reads  |  OMem |  1Mem | Used-Mem |  
---------------------------------------------------------------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                        |                       |      1 |        |      2 |00:00:00.08 |      12 |      7 |       |       |          |  
|   1 |  SORT ORDER BY                          |                       |      1 |      1 |      2 |00:00:00.08 |      12 |      7 |  2048 |  2048 | 2048  (0)|  
|*  2 |   FILTER                                |                       |      1 |        |      2 |00:00:00.08 |      12 |      7 |       |       |          |  
|   3 |    NESTED LOOPS OUTER                   |                       |      1 |      1 |      2 |00:00:00.08 |      10 |      7 |       |       |          |  
|   4 |     NESTED LOOPS                        |                       |      1 |      1 |      2 |00:00:00.06 |       6 |      5 |       |       |          |  
|   5 |      VIEW                               | VW_NSO_1              |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |       |       |          |  
|   6 |       HASH UNIQUE                       |                       |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |  1697K|  1697K|  487K (0)|  
|   7 |        COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |      1 |      1 |00:00:00.01 |       0 |      0 |       |       |          |  
|*  8 |      TABLE ACCESS BY INDEX ROWID        | FNM_VSBL_MSG          |      1 |      1 |      2 |00:00:00.06 |       6 |      5 |       |       |          |  
|*  9 |       INDEX RANGE SCAN                  | XIE2FNM_VSBL_MSG      |      1 |      4 |      4 |00:00:00.04 |       4 |      3 |       |       |          |  
|* 10 |     INDEX RANGE SCAN                    | XPKBCS_INV_RESEND_TRK |      2 |      1 |      2 |00:00:00.01 |       4 |      2 |       |       |          |  
|  11 |    SORT AGGREGATE                       |                       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
|  12 |     FIRST ROW                           |                       |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
|* 13 |      INDEX RANGE SCAN (MIN/MAX)         | XPKBCS_INV_RESEND_TRK |      1 |      1 |      1 |00:00:00.01 |       2 |      0 |       |       |          |  
---------------------------------------------------------------------------------------------------------------------------------------------------------------  

The plan looks great – Oracle predicts a single row driver (operation 5) which can use a very good index (XIE2FNM_VSBL_MSG) in a nested loop, followed by a second nested loop, followed by a filter subquery and a sort of a tiny amount of data. Predictions match actuals all the way down the plan, and the workload is tiny. So what goes wrong in production?

You’ve probably guessed the flaw in this test. Why would anyone include a predicate like ‘INVOICENUMBER’ = ‘INVOICENUMBER’ in production code, or even worse ‘INVOICENUMBER’ = ‘SIEBELORDERID’. The OP has taken a query using bind variables picked up the actual values that were peeked when the query was executed, and substituted them into the test as literals. This has allowed the optimizer to discard two simple predicates and one subquery when the production query would need a plan that catered for the possibility that the second subquery would be the one that had to be executed and the first one bypassed. Here’s the corrected where clause using SQL*Plus variables (not the substitution type, the proper type) for the original bind variables:


WHERE
        MSG.MSG_TYP_CD = '210_CUSTOMER_INVOICE'
AND     MSG.MSG_CAPTR_STG_CD = 'PRE_BCS'
AND     MSG.SRCH_4_FLD_VAL = :BindInvoiceTo
AND     (
            (    :BindSearchBy = 'INVOICENUMBER' 
             AND MSG.MSG_ID IN (
                        SELECT  *
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST(:BindInvoiceList) AS TABLE_OF_VARCHAR)))
            )
         OR (    :BindSearchBy = 'SIEBELORDERID' 
             AND MSG.SRCH_3_FLD_VAL IN (
                        SELECT  *
                        FROM    TABLE(CAST(FNM_GN_IN_STRING_LIST(:BindSeibelIDList) AS TABLE_OF_VARCHAR)))
            )
        )
AND     MSG.MSG_ID = TRK.INV_NUM(+)
AND     (   TRK.RESEND_DT IS NULL
         OR TRK.RESEND_DT = (
                        SELECT  MAX(TRK1.RESEND_DT)
                        FROM    FNM.BCS_INV_RESEND_TRK TRK1
                        WHERE   TRK1.INV_NUM = TRK.INV_NUM
                )
        )

And this, with the “once good” hint in place to force the use of the XIE2FNM_VSBL_MSG index, is the resulting execution plan


---------------------------------------------------------------------------------------------------------  
| Id  | Operation                           | Name                  | E-Rows |  OMem |  1Mem | Used-Mem |  
---------------------------------------------------------------------------------------------------------  
|   0 | SELECT STATEMENT                    |                       |        |       |       |          |  
|   1 |  SORT ORDER BY                      |                       |      1 | 73728 | 73728 |          |  
|*  2 |   FILTER                            |                       |        |       |       |          |  
|   3 |    NESTED LOOPS OUTER               |                       |      1 |       |       |          |  
|*  4 |     TABLE ACCESS BY INDEX ROWID     | FNM_VSBL_MSG          |      1 |       |       |          |  
|*  5 |      INDEX FULL SCAN                | XIE2FNM_VSBL_MSG      |   4975K|       |       |          |  
|*  6 |     INDEX RANGE SCAN                | XPKBCS_INV_RESEND_TRK |      1 |       |       |          |  
|*  7 |    COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |       |       |          |  
|*  8 |    COLLECTION ITERATOR PICKLER FETCH| FNM_GN_IN_STRING_LIST |      1 |       |       |          |  
|   9 |    SORT AGGREGATE                   |                       |      1 |       |       |          |  
|  10 |     FIRST ROW                       |                       |      1 |       |       |          |  
|* 11 |      INDEX RANGE SCAN (MIN/MAX)     | XPKBCS_INV_RESEND_TRK |      1 |       |       |          |  
---------------------------------------------------------------------------------------------------------  
 
Predicate Information (identified by operation id):  
---------------------------------------------------  
   2 - filter((((:BINDSEARCHBY='INVOICENUMBER' AND  IS NOT NULL) OR  
              (:BINDSEARCHBY='SIEBELORDERID' AND  IS NOT NULL)) AND ("TRK"."RESEND_DT" IS NULL OR  
              "TRK"."RESEND_DT"=)))  
   4 - filter(("MSG"."SRCH_4_FLD_VAL"=:BINDINVOICETO AND "MSG"."MSG_CAPTR_STG_CD"='PRE_BCS'))  
   5 - access("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
       filter("MSG"."MSG_TYP_CD"='210_CUSTOMER_INVOICE')  
   6 - access("MSG"."MSG_ID"="TRK"."INV_NUM")  
   7 - filter(VALUE(KOKBF$)=:B1)  
   8 - filter(VALUE(KOKBF$)=:B1)  
  11 - access("TRK1"."INV_NUM"=:B1)  

The “unnested driving subquery” approach can no longer be used – we now start with the fnm_vsbl_msg table (accessing it using a most inefficient execution path because that’s what the hint does for us, and we can obey the hint), and for each row check which of the two subqueries we need to execute. There is, in fact, no way we can hint this query to operate efficiently [at least, that’s my opinion, .I may be wrong].

The story so far

If you’re going to try to use SQL*Plus (or similar) to test a production query with bind variables you can’t just use a sample of literal values in place of the bind variables (though you may get lucky sometimes, of course), you should set up some SQL*Plus variables and assign values to them.

Though I haven’t said it presiously in this article this is an example where a decision that really should have been made by the front-end code has been embedded in the SQL and passed to the database as SQL which cannot be run efficiently. The front end code should have been coded to recognise the choice between invoice numbers and Siebel order ids and sent the appropriate query to the database.

Next Steps

WIthout making a significant change to the front-end mechanism wrapper is it possible to change the SQL so something the optimizer can handle efficiently? Sometimes the answer is yes; so I’ve created a simpler model to demonstrate the basic problem and supply a solution for cases like this one. The key issue is finding a way of working around the OR clauses that are trying to allow the optimizer to choose between two subqueries but make it impossible for either to be unnested into a small driving data set.

First, some tables:


rem
rem     Script:         or_in_twice.sql
rem     Author:         Jonathan Lewis
rem     Dated:          June 2019
rem
rem     Last tested 
rem             18.3.0.0
rem             12.2.0.1
rem

create table t1
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,371)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

alter table t1 add constraint t1_pk primary key(id);

create table t2
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,372)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;

create table t3
as
with generator as (
        select 
                rownum id
        from dual 
        connect by 
                level <= 1e4 -- > comment to avoid WordPress format issue
)
select
        rownum                          id,
        mod(rownum,373)                 n1,
        lpad(rownum,10,'0')             v1,
        lpad('x',100,'x')               padding
from
        generator       v1,
        generator       v2
where
        rownum <= 1e4 -- > comment to avoid WordPress format issue
;


Now a query – first setting up a variable in SQL*Plus to allow us to emulate a production query with bind variables. Since I’m only going to use Explain Plan the variable won’t be peekable, so there would still be some scope for this plan not matching a production plan, but it’s adequate to demonstrate the structural problem:


variable v1 varchar2(10)
exec :v1 := 'INVOICE'

explain plan for
select
        t1.v1 
from
        t1
where
        (
            :v1 = 'INVOICE' 
        and t1.id in (select id from t2 where n1 = 0)
        )
or      (
            :v1 = 'ORDERID' 
        and t1.id in (select id from t3 where n1 = 0)
        )
;

select * from table(dbms_xplan.display);

---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      |    10 |   150 |    26   (4)| 00:00:01 |
|*  1 |  FILTER            |      |       |       |            |          |
|   2 |   TABLE ACCESS FULL| T1   | 10000 |   146K|    26   (4)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T2   |     1 |     8 |    26   (4)| 00:00:01 |
|*  4 |   TABLE ACCESS FULL| T3   |     1 |     8 |    26   (4)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter(:V1='INVOICE' AND  EXISTS (SELECT 0 FROM "T2" "T2" WHERE
              "ID"=:B1 AND "N1"=0) OR :V1='ORDERID' AND  EXISTS (SELECT 0 FROM "T3"
              "T3" WHERE "ID"=:B2 AND "N1"=0))
   3 - filter("ID"=:B1 AND "N1"=0)
   4 - filter("ID"=:B1 AND "N1"=0)

As you can see, thanks to the OR that effectively gives Oracle the choice between running the subquery against t3 or the one against t2, Oracle is unable to do any unnesting. (In fact different versions of Oracle allow different levels of sophistication with disjuncts (OR) of subqueries, so this is the kind of example that’s always useful to keep for tests against future versions.)

Since we know that we are going to use one of the data sets supplied in one of the subqueries and have no risk of double-counting or eliminating required duplicates, one strategy we could adopt for this query is to rewrite the two subqueries as a single subquery with a union all – because we know the optimizer can usually handle a single IN subquery very nicely. So let’s try the following:


explain plan for
select
        t1.v1
from
        t1
where
        t1.id in (
                select  id 
                from    t2 
                where   n1 = 0
                and     :v1 = 'INVOICE'
                union all
                select  id 
                from    t3 
                where   n1 = 0
                and     :v1 = 'ORDERID'
        )
;

select * from table(dbms_xplan.display);

-----------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |    54 |  1512 |    77   (3)| 00:00:01 |
|*  1 |  HASH JOIN             |          |    54 |  1512 |    77   (3)| 00:00:01 |
|   2 |   VIEW                 | VW_NSO_1 |    54 |   702 |    51   (2)| 00:00:01 |
|   3 |    HASH UNIQUE         |          |    54 |   432 |    51   (2)| 00:00:01 |
|   4 |     UNION-ALL          |          |       |       |            |          |
|*  5 |      FILTER            |          |       |       |            |          |
|*  6 |       TABLE ACCESS FULL| T2       |    27 |   216 |    26   (4)| 00:00:01 |
|*  7 |      FILTER            |          |       |       |            |          |
|*  8 |       TABLE ACCESS FULL| T3       |    27 |   216 |    26   (4)| 00:00:01 |
|   9 |   TABLE ACCESS FULL    | T1       | 10000 |   146K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="ID")
   5 - filter(:V1='INVOICE')
   6 - filter("N1"=0)
   7 - filter(:V1='ORDERID')
   8 - filter("N1"=0)


Thanks to the FILTERs at operations 5 and 7 this plan will pick the data from just one of the two subqueries, reduce it to a unique list and then use that as the build table to a hash join. Of course, with different data (or suitable hints) that hash join could become a nested loop using a high precision index.

But there’s an alternative. We manually rewrote the two subqueries as a single union all subquery and as we did so we moved the bind variable comparisons inside their respective subqueries; maybe we don’t need to introduce the union all. What would happen if we simply take the original query and move the “constant” predicates inside their subqueries?


explain plan for
select
        t1.v1
from
        t1
where
        t1.id in (select id from t2 where n1 = 0 and :v1 = 'INVOICE')
or      t1.id in (select id from t3 where n1 = 0 and :v1 = 'ORDERID')
;

select * from table(dbms_xplan.display);


-----------------------------------------------------------------------------------
| Id  | Operation              | Name     | Rows  | Bytes | Cost (%CPU)| Time     |
-----------------------------------------------------------------------------------
|   0 | SELECT STATEMENT       |          |    54 |  1512 |    77   (3)| 00:00:01 |
|*  1 |  HASH JOIN             |          |    54 |  1512 |    77   (3)| 00:00:01 |
|   2 |   VIEW                 | VW_NSO_1 |    54 |   702 |    51   (2)| 00:00:01 |
|   3 |    HASH UNIQUE         |          |    54 |   432 |    51   (2)| 00:00:01 |
|   4 |     UNION-ALL          |          |       |       |            |          |
|*  5 |      FILTER            |          |       |       |            |          |
|*  6 |       TABLE ACCESS FULL| T3       |    27 |   216 |    26   (4)| 00:00:01 |
|*  7 |      FILTER            |          |       |       |            |          |
|*  8 |       TABLE ACCESS FULL| T2       |    27 |   216 |    26   (4)| 00:00:01 |
|   9 |   TABLE ACCESS FULL    | T1       | 10000 |   146K|    26   (4)| 00:00:01 |
-----------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------
   1 - access("T1"."ID"="ID")
   5 - filter(:V1='ORDERID')
   6 - filter("N1"=0)
   7 - filter(:V1='INVOICE')
   8 - filter("N1"=0)

In 12.2.0.1 and 18.3.0.0 it gets the same plan as we did with our “single subquery” rewrite – the optimizer is able to construct the union all single subquery (although the ordering of the subqueries has been reversed) and unnest without any other manual intervention. (You may find that earlier versions of Oracle don’t manage to do this, but you might have to go all the way back to 10g.

Conclusion

Oracle doesn’t like disjuncts (OR) and finds conjuncts (AND) much easier to cope with. Mixing OR and subqueries is a good way to create inefficient execution plans, especially when you try to force the optimizer to handle a decision that should have been taken in the front-end code. The optimizer, however, gets increasingly skilled at handling the mixture as you move through the newer versions; but you may have to find ways to give it a little help if you see it running subqueries as filter subqueries when you’re expecting it to unnest a subquery to produce a small driving data set.

 

Connecting to a Repository via a Dynamically Edited dfc.properties File (part II)

Yann Neuhaus - Sun, 2019-06-16 13:37

This is part II of the 2-part article. See for part I of this article.

Testing

We will test on the host machine named docker that hosts 2 containers, container01 and container011. All 3 machines run a repository. Its name is respectively dmtest on docker (shortly, dmtest@docker:1489), dmtest01@container01:1489 (dmtest01@container01:2489 externally) and dmtest01@container011:1489 (dmtest01@container011:5489 externally). Incidentally, the enhanced syntax is also a good way to uniquely identify the repositories.
The current dfc.properties file on the host docker:

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489

This is used for the local docbase dmtest.
Let’s tag all the docbases for an easy identification later:

$ iapi dmtest -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest on docker host VM
save,c,l
eoq

Idem from within container01 with its default dfc.properties file:

$ iapi dmtest01 -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest01 created silently on container01
save,c,l
eoq

Idem from within container011:

$ iapi dmtest01 -Udmadmin -Pdmadmin <<eoq
retrieve,c,dm_docbase_config
set,c,l,title
dmtest01 created silently on container011
save,c,l
eoq

First, let's access container01.dmtest01 from the containers' host VM with the current dfc.properties file:

$ idql dmtest01 -Udmadmin -Pdmadmin
 
 
OpenText Documentum idql - Interactive document query interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0070.0035
 
 
Connecting to Server using docbase dmtest01
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:1489) does not know of a server for the specified docbase (dmtest01)"

As expected, it does not work because container01.dmtest01 does not project to the host’s docbroker. Now, let’s turn to widql:

$ ./widql dmtest01@docker:2489 -Udmadmin -Pdmadmin --keep <<eoq
select title from dm_docbase_config
go
eoq
OpenText Documentum idql - Interactive document query interface
Copyright (c) 2018. OpenText Corporation
All rights reserved.
Client Library Release 16.4.0070.0035
 
 
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800011bb started for user dmadmin."
 
 
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
------------------------------------------
dmtest01 created silently on container01

It works.
We used ––keep, therefore the dfc.properties file has changed:

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=2489

Indeed.
That docbase can also be reached by the container’s IP address and internal port 1489:

$ docker exec -it container01 ifconfig eth0 | head -3
eth0: flags=4163 mtu 1500
inet 192.168.33.101 netmask 255.255.255.0 broadcast 192.168.33.255
ether 02:42:c0:a8:21:65 txqueuelen 0 (Ethernet)
 
$ ./widql dmtest01@192.168.33.101:1489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800011b5 started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container01

Is the local dmtest docbase still reachable ?:

$ idql dmtest -Udmadmin -Pdmadmin
...
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Not with that changed dfc.properties file and the standard tools. But by using our nifty little tool:

$ ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
title
----------------------
dmtest on host VM

Fine !
Is container011.dmtest01 reachable now ?

$ ./widql dmtest01 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
...
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
-------------------------------------------
dmtest01 created silently on container01

This is container01.dmtest01, not the one we want, i.e. the one on container011.
Note that ./widql was called without the extended syntax so it invoked the standard idql directly.
Let try again:

$ ./widql dmtest01@docker:5489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c3508000059e started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container011

Here we go, it works !
The same using the container’s IP address and its docbroker’s internal port:

$ docker exec -it container011 ifconfig eth0 | head -3
eth0: flags=4163 mtu 1500
inet 192.168.33.104 netmask 255.255.255.0 broadcast 192.168.33.255
ether 02:42:c0:a8:21:68 txqueuelen 0 (Ethernet)
 
$ ./widql dmtest01@192.168.33.104:5489 -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080000598 started for user dmadmin."
...
title
------------------------------------------
dmtest01 created silently on container011

Try now the same connection but with ––append and ––keep:

$ ./widql dmtest01@docker:5489 -Udmadmin -Pdmadmin --append --keep <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest01
...
Connected to OpenText Documentum Server running Release 16.4.0000.0248 Linux64.Oracle
title
-------------------------------------------
dmtest01 created silently on container011

What is the content of dfc.properties now ?

$ grep docbroker /app/dctm/config/dfc.properties
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=2489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=5489

Both options have been taken into account as expected.
Let’s try to reach the VM host’s repository:

$ ./widql dmtest -Udmadmin -Pdmadmin <<eoq
select title from dm_docbase_config
go
eoq
...
Connecting to Server using docbase dmtest
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Specify the docbroker’s host and the ––verbose option:

$ ./widql dmtest@docker -Udmadmin -Pdmadmin --verbose <<eoq
select title from dm_docbase_config
go
eoq
 
changing to docker:...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=2489
diffs:
12,13d11
< dfc.docbroker.host[1]=docker
< dfc.docbroker.port[1]=5489
calling original: /app/dctm/product/16.4/bin/idql dmtest -Udmadmin -Pdmadmin
...
Connecting to Server using docbase dmtest
Could not connect
[DM_DOCBROKER_E_NO_SERVERS_FOR_DOCBASE]error: "The DocBroker running on host (docker:2489) does not know of a server for the specified docbase (dmtest)"

Since the port was not specified, the wrapper took the first port found in the dfc.properties to supply the missing value, i.e. 2489 which is incorrect as dmtest@docker only projects to port docker:1489.
Use an unambiguous command now:

$ ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin ––verbose <<eoq
select title from dm_docbase_config
go
eoq
 
changing to docker:1489...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=1489
diffs:
11,13c11
< dfc.docbroker.port[0]=2489
< dfc.docbroker.host[1]=docker
dfc.docbroker.port[0]=1489
calling original: /app/dctm/product/16.4/bin/idql dmtest -Udmadmin -Pdmadmin
...
Connecting to Server using docbase dmtest
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
title
--------------------
dmtest on host VM

Looks OK.
Let’s try wdmawk now. But first, here is the test code twdmawk.awk:

$ cat twdmawk.awk 
BEGIN {
   print "repo_target=" repo_target, "docbase=" docbase
   session = dmAPIGet("connect," docbase ",dmadmin,dmadmin")
   print dmAPIGet("getmessage," session)
   dmAPIGet("retrieve," session ",dm_docbase_config")
   print dmAPIGet("get," session ",l,title")
   dmAPIExec("disconnect," session)
   exit(0)
}

Line 3 displays the two variables automatically passed to dmawk by the wrapper, repo_target and docbase.
The test script connects to the docbase which was silently passed as command-line parameter by wdmawk through the -v option after it extracted it from the given target parameter docbase[@host[:port]], as illustrated below with the ––verbose option.
Let’s see the invocation for the repository on the host VM:

$ ./wdmawk dmtest@docker:1489 -f ./twdmawk.awk --verbose
changing to docker:1489...
requested changes:
# removed: dfc.docbroker.host[0]=docker
# removed: dfc.docbroker.port[0]=2489
# removed: dfc.docbroker.host[1]=docker
# removed: dfc.docbroker.port[1]=5489
# added: dfc.docbroker.host[0]=docker
# added: dfc.docbroker.port[0]=1489
diffs:
11,13c11
< dfc.docbroker.port[0]=2489
< dfc.docbroker.host[1]=docker
––
> dfc.docbroker.port[0]=1489
calling original: /app/dctm/product/16.4/bin/dmawk -v docbase=dmtest -f ./twdmawk.awk
repo_target= docbase=dmtest
[DM_SESSION_I_SESSION_START]info: "Session 0100c3508000367b started for user dmadmin."
 
 
dmtest on host VM

Let’s acces the container01’s repository :

$ ./wdmawk dmtest01@docker:2489 -f ./twdmawk.awk
 
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080001202 started for user dmadmin."
 
 
dmtest01 created silently on container01

A small typo in the port number and …

dmadmin@docker:~$ ./wdmawk dmtest01@docker:3489 -f ./twdmawk.awk
 
[DFC_DOCBROKER_REQUEST_FAILED] Request to Docbroker "docker:3489" failed
 
[DM_SESSION_E_RPC_ERROR]error: "Server communication failure"
 
java.net.ConnectException: Connection refused (Connection refused)

Note the stupid error message “… Connection refused …”, very misleading when investigating a problem. It’s just that there nobody listening on that port.
Let’s access the container011’s repository:

dmadmin@docker:~$ ./wdmawk dmtest01@docker:5489 -f ./twdmawk.awk
 
[DM_SESSION_I_SESSION_START]info: "Session 0100c350800005ef started for user dmadmin."
 
 
dmtest01 created silently on container011

Effect of the -v option:

dmadmin@docker:~$ ./wdmawk -v dmtest01@docker:5489 -f ./twdmawk.awk --verbose
...
calling original: /app/dctm/product/16.4/bin/dmawk -v repo_target=dmtest@docker:1489 -v docbase=dmtest -f ./twdmawk.awk
repo_target=dmtest@docker:1489 docbase=dmtest
[DM_SESSION_I_SESSION_START]info: "Session 0100c35080003684 started for user dmadmin."
 
 
dmtest on host VM

A repo_target parameter with the extended syntax has been passed to dmawk.
Let’s now quickly check the wrapper from within the containers.
Container01
The host’s docbase:

[dmadmin@container01 scripts]$ ./wiapi dmtest@docker:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest
...
dmtest on host VM

The container011’s docbase:

[dmadmin@container01 scripts]$ ./wiapi dmtest01@container011:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest01
...
dmtest01 created silently on container011
...

Container011
The host’s docbase:

dmadmin@container011 scripts]$ ./wiapi dmtest@docker:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
Connecting to Server using docbase dmtest
...
Connected to OpenText Documentum Server running Release 16.4.0080.0129 Linux64.Oracle
...
dmtest on host VM
...

The docbase on container01:

dmadmin@container011 scripts]$ ./wiapi dmtest01@container01:1489 -Udmadmin -Pdmadmin<<eoq
retrieve,c,dm_docbase_config
get,c,l,title
eoq
...
...
Connecting to Server using docbase dmtest01
...
dmtest01 created silently on container01
...

Let’s briefly test the usage of the sourced configuration file. Here is a snippet of the file shown earlier in this article:

# repository connection configuration file;
# must be sourced prior so the environment variables can be resolved;
# this is a enhancement over the dfc.properties file syntax used by the dctm_wrapper utility:
# docbroker.host[i]=...
# docbroker.port[i]=...
# it supports several syntaxes:
# docbroker only definition [[docbroker_host]:[port]];
#    usage: ./widql dmtest@$dmtest
# full definition docbase[@[docbroker_host]:[port]]
#    usage: ./widql $test
# alternate ':' separator docbase:[[docbroker_host]:[docroker_port]];
#    usage: ./widql $dmtestVM
# alias literal;
#    usage: ./widql test
# in order to resolve alias literals, the wrapper will source the configuration file by itself;
...
# container011.dmtest01;
# docbroker only definition docbroker_host:port;
d_dmtest011=container011:5489
di_dmtest011=192.168.33.104:1489
# full definition;
f_dmtest011=dmtest01@container011:2489
fip_dmtest011=dmtest01@192.168.33.104:1489

With a good name convention, the variables can be easily remembered which saves a lot of typing too.
Note on lines 9 and 10 how the whole extended target name can be specified, including the repository name.
A few tests:

dmadmin@docker:~$ ./widql dmtest01@$d_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@container011:5489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql dmtest01@$dip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@192.168.33.104:1489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql $f_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@container011:2489 -Udmadmin -Pdmadmin --verbose] ...
 
dmadmin@docker:~$ ./widql $fip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql dmtest01@192.168.33.104:1489 -Udmadmin -Pdmadmin --verbose] ...

The variables have been expanded by the shell prior to entering the wrapper, no programming effort was needed here, which is always appreciated.

Possible Enhancements

As shown precedently, the alternate configuration file lists aliases for the couples docbroker:port and even repository@docbroker:port. In passing, the wrapper also supports the version repository:docbroker:port.
Now, in order to better match Documentum syntax, is it possible to be even more transparent by removing dollar signs, colons and at-signs while still accessing the extended syntax ? E.g.:

$ ./widql dmtest -Udmadmin ....

Yes it is. The trick here is to first look up the alias in the configuration file, which incidentally becomes mandatory now, and re-execute the program with the alias resolved. As we are all lazy coders, we will not explicitly code the looking up but instead rely on the shell: the wrapper will source the file, resolve the target and re-execute itself.
If the alias has not been defined in the file, then the wrapper considers it as the name of a repository and falls back to the usual command-line tools.
A good thing is that no new format has to be introduced in the file as the target is still the name of an environment variable.
Since the changes are really minimal, let’s do it. Hereafter, the diff output showing the changes from the listing in part I:

> # this variable points to the target repositories alias file and defaults to repository_connections.aliases;
> REPO_ALIAS=${REPO_ALIAS:-~/repository_connections.aliases}
> 
107a111
> [[ bVerbose -eq 1 ]] && echo "current configuration file=[$REPO_ALIAS]"
225,227c229,241
<    if [[ $bVerbose -eq 1 ]]; then
<       echo "no change to current $dfc_config file"
    [[ -f $REPO_ALIAS ]] && . $REPO_ALIAS
>    definition=${!1}
>    [[ $bVerbose -eq 1 ]] && echo "alias lookup in $REPO_ALIAS: $1 = $definition"
>    if [[ ! -z $definition ]]; then
>       new_cmd=${current_cmd/$1/$definition}
>       [[ $bVerbose -eq 1 ]] && echo "invoking $new_cmd"
>       exec $new_cmd
>    else
>       if [[ $bVerbose -eq 1 ]]; then
>          echo "no change to current $dfc_config file"
>          echo "calling original: $DM_HOME/bin/${dctm_program} $*"
>       fi
>       $DM_HOME/bin/${dctm_program} $*
229d242
<    $DM_HOME/bin/${dctm_program} $*

On line 9, the target configuration file pointed to by the REPO_ALIAS environment variable gets sourced if existing. $REPO_ALIAS defaults to repository_connections.aliases but can be changed before calling the wrapper.
Note on line 10 how bash can dereference a variable 1 containing the name of another variable 2 to get variable 2’s value (indirect expansion), nice touch.
To apply the patch in-place, save the diffs above in diff-file and run the following command:

patch old-file < diff-file

Testing
For conciseness, the tests below only show how the target is resolved. The actual connection has already been tested abundantly earlier.

dmadmin@docker:~$ ./widql f_dmtest -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql f_dmtest -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: f_dmtest = dmtest@docker:1489
invoking ./widql dmtest@docker:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest@docker:1489 -Udmadmin -Pdmadmin --verbose] ...
dmadmin@docker:~$ ./widql fip_dmtest01 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql fip_dmtest01 -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: fip_dmtest01 = dmtest01@192.168.33.2:1489
invoking ./widql dmtest01@192.168.33.2:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest01@192.168.33.2:1489 -Udmadmin -Pdmadmin --verbose] ...
dmadmin@docker:~$ ./widql fip_dmtest011 -Udmadmin -Pdmadmin --verbose
current_cmd=[./widql fip_dmtest011 -Udmadmin -Pdmadmin --verbose] alias lookup in /home/dmadmin/repository_connections.aliases: fip_dmtest011 = dmtest01@192.168.33.3:1489
invoking ./widql dmtest01@192.168.33.3:1489 -Udmadmin -Pdmadmin --verbose
current_cmd=[/home/dmadmin/widql dmtest01@192.168.33.3:1489 -Udmadmin -Pdmadmin --verbose]

Note how the targets are cleaner now, no curly little fancy shell characters in front.

Conclusion

As I was testing this little utility, I was surprised to realize how confortable and natural its usage is. It feels actually better to add the docbroker’s host and port than to stop at the docbase name, probably because it makes the intented repository absolutely unambiguous. The good thing is that is almost invisible, except for its invocation but even this can be smoothed out by using command aliases or renaming the symlinks.
When one has to work with identically named docbases or with clones existing in different environments, dctm-wrapper brings a real relief. And it was quick and easy to put together too.
As it modifies an essential configuration file, it is mainly aimed at developers or administrators on their machine, but then those constitute the targeted audience anyway.
As always, if you have any ideas for some utility that could benefit us all, please do no hesitate to suggest them in the comment section. Feedback is welcome too of course.

Cet article Connecting to a Repository via a Dynamically Edited dfc.properties File (part II) est apparu en premier sur Blog dbi services.

Connecting to a Repository via a Dynamically Edited dfc.properties File (part I)

Yann Neuhaus - Sun, 2019-06-16 13:36
Connecting to a Repository via a Dynamically Edited dfc.properties File

Now that we have containerized content servers, it is very easy, maybe too easy, to create new repositories. Their creation is still not any faster (whether they are containerized or not is irrelevant here) but given a configuration file it just takes one command to instantiate an image into a running container with working repositories in it. Thus, during experimentation and testing, out of laziness or in a hurry, one can quickly finish up having several containers with identically named repositories, e.g. dmtest01, with an identically named docbroker, e.g. docbroker01. Now, suppose one wants to connect to the docbase dmtest01 running on the 3rd such container using the familiar command-line tools idql/iapi/dmawk. How then to select that particular instance of dmtest01 among all the others ?
To precise the test case, let’s say that we are using a custom bridge network to link the containers together on the docker host (appropriately named docker) which is a VirtualBox VM running an Ubuntu flavor. The metal also runs natively the same Ubuntu distro. It looks complicated but actually matches the common on-premises infrastructure type where the metal is an ESX or equivalent, its O/S is the hypervisor and the VMs run a Redhat or Suse distro. As this is a local testing environment, no DNS or network customizations have been introduced save for the custom bridge.
We want to reach a remote repository either from container to container or from container to host or from host to container.
The problem here stems from the lack of flexibility in the docbroker/dfc.properties file mechanism and no network fiddling can work around this.

It’s All in The dfc.properties File

Containers have distinct host names, so suffice it to edit their local dfc.properties file and edit this field only. Their file may all look like the one below:

dfc.docbroker.host[0]=container01
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=1489
dfc.docbroker.host[3]=container011
dfc.docbroker.port[3]=1489
dfc.docbroker.host[4]=container02
dfc.docbroker.port[4]=1489

In effect, the custom bridge network embeds a DNS for all the attached containers, so their host names are known to each other (but not to the host so IP address must be used from there or the host’s /etc/hosts file must be edited). The docbroker ports are the ones inside the containers and have all the same value 1489 because they were created out of the same configuration files. The docker entry has been added to the containers’ /etc/host file via the ––add-host= clause of the docker run’s command.
For the containers’ host machine, where a Documentum repository has been installed too, the dfc.properties file could look like this one:

dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=docker
dfc.docbroker.port[1]=2489
dfc.docbroker.host[3]=docker
dfc.docbroker.port[3]=3489
dfc.docbroker.host[4]=docker
dfc.docbroker.port[4]=5489

Here, the host name is the one of the VM where the containers sit and is the same for all the containers. The port numbers differ because they are the external container’s port which are published in the host VM and mapped to the respective docbroker’s internal port, 1489. Since the containers share the same custom network, their host names, IP addresses and external ports must all be different when running the image, or docker won’t allow it.
Alternatively, the container’s IP addresses and internal docbroker’s ports could be used directly too if one is too lazy to declare the containers’ host names in the host’s /etc/hosts file, which is generally the case when testing:

dfc.docbroker.host[0]=docker 
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=192.168.33.101
dfc.docbroker.port[1]=1489
dfc.docbroker.host[2]=192.168.33.102
dfc.docbroker.port[2]=1489
dfc.docbroker.host[3]=192.168.33.104
dfc.docbroker.port[3]=1489

The host’s custom network will take care of routing the traffic into the respective containers.
Can you spot the problem now ? As all the containers contain identically named repositories (for clarity, let’s say that we are looking for the docbase dmtest01), the first contacted docbroker in that file will always reply successfully because there is indeed a dmtest01 docbase in that container and consequently one will always be directed to the docbase container01.dmtest01. If one wants to contact container03.dmtest01, this configuration won’t let do it. One would need to edit it and move the target container03 host in the first position, which is OK until one wants to access container02.dmtest01 or go back to container01.dmtest01.
This situation has been existing forever but containers make it more obvious because they make it so much easier to have repository homonyms.
So is there a simpler way to work around this limitation than editing back and forth a configuration file or giving different names to the containerized repositories ?

A Few Reminders

Documentum has made quite a lot of design decisions inspired by the Oracle DBMS but their implementation is far from offering the same level of flexibility and power, and this is often irritating. Let’s consider the connectivity for example. Simply speaking, Oracle’s SQL*Net configuration relies mainly on a tnsnames.ora file for the connectivity (it can also use a centralized ldap server but let’s keep it simple). This file contains entries used to contact listeners and get the information needed to connect to the related database. Minimal data to provide in the entries are the listener’s hostname and port, and the database sid or service name, e.g.:

...
ORCL =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = db)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = db_service)
    )
  )
...

A connection to the database db_service can simply be requested as follows:

sqlplus scott@orcl

orcl is the SQL*Net alias for the database served by db_service. It works like an index in a lookup table, the tnsnames.ora file.
Compare this with a typical dfc.properties file, e.g. /home/dmadmin/documentum/shared/config/dfc.properties:

...
dfc.docbroker.host[0]=docker
dfc.docbroker.port[0]=1489
dfc.docbroker.host[1]=dmtest
dfc.docbroker.port[1]=1489
...

Similarly, instead of contacting listeners, we have here docbrokers. A connection to the docbase dmtest can be requested as follows:

idql dmtest

dmtest is the target repository. It is not a lookup key in the dfc.properties file. Unlike the tnsnames.ora file and its aliases, there is an indirection here and the dfc.properties file does not directly tell where to find a certain repository, it just lists the docbrokers to be sequentially queried about it until the first one that knows the repository (or an homonym thereof) answers. If the returned target docbase is the wrong homonym, tough luck, it will not be reachable, unless the order of the entries is changed. Repositories announces themselves to the docbrokers by “projecting” themselves. If two repositories by the same name project to the same docbroker, no error is raised but the docbroker can return unexpected results, e.g. one may finish up in the unintended docbase.
Another major difference is that with Oracle but not with Documentum, it is possible to bypass the tnsnames.ora file by specifying the connection data in-line, e.g. on the command-line:

sqlplus scott@'(DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = db)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SERVICE_NAME = db_service)
    )
  )'

This can be very useful when editing the local, official listener.ora file is not allowed, and sometimes faster than setting $TNS_ADMIN to an accessible local directory and editing a private listener.ora file there.
This annoyance is even more frustrating because Documentum’s command-line tools do support a similar syntax but for a different purpose:

idql repository[.service][@machine] [other parameters]

While this syntax is logically useful to access the service (akin to an Oracle’s instance but for a HA Documentum installation), it is used in a distributed repository environment to contact a particular node’s docbroker; however, it still does not work if that docbroker is not first declared in the local dfc.properties file.
Last but not least, one more reason to be frustrated is that the DfCs do allow to choose a specific docbroker when opening a session, as illustrated by the jython snippet below:

import traceback
import com.documentum.fc.client as DFCClient
import com.documentum.fc.common as DFCCommon

docbroker_host = "docker"
docbroker_port = "1489"
docbase = "dmtest"
username = "dmadmin"
password = "dmadmin"
print("attempting to connect to " + docbase + " as " + username + "/" + password + " via docbroker on host " + docbroker_host + ":" + docbroker_port)
try:
  client = DFCClient.DfClient.getLocalClient()

  config = client.getClientConfig()
  config.setString ("primary_host", docbroker_host)
  config.setString ("primary_port", docbroker_port)

  logInfo = DFCCommon.DfLoginInfo()
  logInfo.setUser(username)
  logInfo.setPassword(password)
  docbase_session = client.newSession(docbase, logInfo)

  if docbase_session is not None:
    print("Connected !")
  else:
    print("Couldn't connect !")
except Exception:
  traceback.print_exc()

Content of dfc.properties:

$ cat documentum/shared/config/dfc.properties
dfc.date_format=dd.MM.yyyy HH:mm:ss

Execution:

$ jython ./test.jy
...
attempting to connect to dmtest as dmadmin/dmadmin via docbroker docker
Connected !

Despite a dfc.properties file devoid of any docbroker definition, the connection was successful. Unfortunately, this convenience has not been carried over to the vegetative command-line tools.
While we can dream and hope for those tools to be resurrected and a backport miracle to happen (are you listening OTX ?), the next best thing is to tackle ourselves this shortcoming and implement an as unobtrusive as possible solution. Let’s see how.

A few Proposals

Currently, one has to manually edit the local dfc.properties file, but this is tedious to say the least, because changes must sometimes be done twice, forwards and rolled back if the change is only temporary. To avoid this, we could add at once in our local dfc.properties file all the machines that host repositories of interest but this file could quickly grow large and it won’t solve the case of repository homonyms. The situation would become quite unmanageable although an environment variable such as the late DMCL_CONFIG (appropriately revamped e.g. to DFC_PROPERTIES_CONFIG for the full path name of the dfc.properties file to use) could help here to organize those entries. But there is not such a variable any longer for the command-line tools (those tools have stopped evolving since CS v6.x) although there is a property for the DfCs to pass to the JVM at startup, -Ddfc.properties.file, or even the #include clause in the dfc.properties file, or playing with the $CLASSPATH but there is a better way.
What about an on-the-fly, transparent, behind the scenes dfc.properties file editing to support a connection syntax similar to the Oracle’s in-line one ?
Proposal 1
Let’s specify the address of the docbroker of interest directly on the command-line, as follows:

$ idql dmtest01@container03:3489
or
$ idql dmtest01@192.168.33.104:3489

This is more akin to Oracle in-line connection syntax above.
Proposal 2
An alternative could be to use an Oracle’s tnsnames.ora-like configuration file such as the one below (and (in (keeping (with (the (lisp spirit)))))):

dmtest01 = ((docbroker.host = container01) (docbroker.port = 1489))
dmtest02 = ((docbroker.host = container02) (docbroker.port = 1489))
dmtest03 = ((docbroker.host = container03) (docbroker.port = 1489))

and to use it thusly:

$ idql dmtest01@dmtest03

dmtest03 is looked up in the configuration file and replaced on the command-line by its definition.
Proposal 3
With a more concise configuration file that can also be sourced:

dmtest01=container01:1489
dmtest02=container02:1489
dmtest03=container03:1489

and used as follows:

$ export REPO_ALIAS=~/repository_connections.aliases
$ . $REPO_ALIAS
$ ./widql dmtest01@$dmtest03

$dmtest03 is directly fetched from the environment after the configuration file has been sourced, which is equivalent to a lookup. Since the variable substitution occurs at the shell level, it comes free of charge.
With a bit more generalization, it is possible to merge the three proposals together:

$ idql repository(@host_literal:port_number) | @$target

In other words, one can either provide literally the full connection information or provide a variable which will be resolved by the shell from a configuration file to be sourced preliminarily.
Let’s push the configuration file a bit farther and define complete aliases up to the repository name like this:

dmtest=dmtest@docker:1489
or even so:
dmtest=dmtest:docker:1489

Usage:

$ ./widql $dmtest

The shell will expand the alias with its definition. The good thing is the definition styles can be mixed and matched to suit one’s fantasy. Example of a configuration file:

# must be sourced prior so the environment variables can be resolved;
# this is a enhancement over the dfc.properties file syntax used by the dctm_wrapper utility:
# docbroker.host[i]=...
# docbroker.port[i]=...
# it supports several syntaxes:
# docbroker only definition docbroker_host:port;
#    usage: ./widql dmtest@$dmtest
# full definition docbase[@[docbroker_host]:[port]]
#    usage: ./widql $test
# alternate ':' separator docbase:[docbroker_host]:[port];
#    usage: ./widql $dmtestVM
# alias literal;
#    usage: ./widql test
# in order to resolve alias literals, the wrapper will source the configuration file by itself;

# docker.dmtest;
# docbroker only definition;
d_dmtest=docker:1489
# full definition;
f_dmtest=dmtest@docker:1489
# alternate ':' separator;
a_dmtest=dmtest:docker:1489

# container01.dmtest01;
# docbroker only definition;
d_dmtest01=container01:2489
dip_dmtest01=192.168.33.101:1489
# full definition;
f_dmtest01=dmtest01@container01:2489
fip_dmtest01c=dmtest01@192.168.33.101:1489
# alternate ':' separator;
a_dmtest01=dmtest01:container01:2489
aip_dmtest01=dmtest01:192.168.33.101:2489

# container011.dmtest01;
# docbroker only definition;
d_dmtest011=container011:5489
dip_dmtest011=192.168.33.104:1489
# full definition;
f_dmtest011=dmtest01@container011:2489
fip_dmtest011=dmtest01@192.168.33.104:1489
# alternate ':' separator;
a_dmtest011=dmtest01:container011:2489
aip_dmtest011=dmtest01:192.168.33.104:2489

Lines 5 to 14 explains all the supported target syntaxes with a new one presented on lines 12 to 14, which will be explained later in the paragraph entitled Possible Enhancements.
Using lookup variables in a configuration file makes things easier when the host names are hard to remember because better mnemonic aliases can be defined for them. Also, as they are looked up, the entries can be in any order. They must obviously be unique or they will mask each other. A consistent naming convention may be required to easily find one own’s way into this file.
Whenever the enhanced syntax is used, it triggers an automatic editing of the dfc.properties file and the specified connection information is inserted as dfc.docbroker.host and dfc.docbroker.port entries. Then, the corresponding Documentum tool gets invoked and finally the original dfc.properties file is restored when the tool exits. The trigger here is the presence of the @ or : characters in the first command-line parameter.
This would also cover the case when an entry is simply missing from the dfc.properties file. Actually, from the point of view of the command-line tools, all the connection definitions could be handled over to the new configuration file and even removed from dfc.properties as they are dynamically added to and deleted from the latter file as needed.

The Implementation

The above proposal looks pretty easy and fun to implement, so let’s give it a shot. In this article, I’ll present a little script, dctm_wrapper, that builds upon the above @syntax to first edit the configuration file on demand (that’s the dynamic part of the article’s title) and then invoke the standard idql, iapi or dmawk utilities, with an optional rollback of the change on exiting.
Since it is not possible to bypass the dfc.properties files, we will dynamically modify it whenever the @host syntax is used from a command-line tool. As we do no want to replace the official idql, iapi and dmawk tools, yet, we will create new ones, say widql, wiapi and wdmawk (where w stands for wrapper). Those will be symlinks to the real script, dctm-wrapper.sh, which will be able to invoke either idql, iapi or dmawk according to how it was called (bash’s $0 contains the name of the symlink that was invoked, even though its target is always dctm-wrapper.sh, see the script’s source at the next paragraph).
The script dctm-wrapper.sh will support the following syntax:

$ ./widql docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep] $ ./wiapi docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep] $ ./wdmawk [-v] docbase[@[host][:port]] [dmawk parameters] [--verbose] [--append] [--keep]

The custom parameters ––verbose, ––append and ––keep are processed by the script and stripped off before invoking the official tools.
wdmawk is a bit special in that the native tool, dmawk, is invoked differently from iapi/idql but I felt that it too could benefit from this little hack. Therefore, in addition to the non-interactive editing of the dfc.properties file, wdmawk also passes on the target docbase name as a -v docbase=… command-line parameter (the standard way to pass parameters in awk) and removes the extended target parameter docbase[@[host][:port]] unless it is prefixed by the -v option in which case it gets forwarded through the -v repo_target= parameter. The dmawk program is then free to use them the way it likes. The repo_target parameter could have been specified on the command-line independently but the -v option can still be useful in cases such as the one below:

$ ./wdmawk docbase@docker:1489 -v repo_target=docbase@docker:1489 '{....}'

which can be shortened to

$ ./wdmawk -v docbase@docker:1489 '{....}'

If the extended target docbase parameter is present, it must be the first one.
If the ‘@’ or ‘:’ characters are missing, it means the enhanced syntax is not used and the script will not attempt to modify dfc.properties; it will pass on all the remaining parameters to the matching official tools.
When @[host][:port] is present, the dfc.properties file will be edited to accommodate the new docbroker’s parameters; all the existing couples dfc.docbroker.host/dfc.docbroker.port will either be removed (if ––append is missing) or preserved (if ––append is present) and a new couple entry will be appended with the given values. Obviously, if one want to avoid the homonym trap, ––append should not be used in order to let the given docbroker be picked up as the sole entry in the property file.
When ––append and ––keep are present, we end up with a convenient way to add docbroker entries into the property file without manually editing it.
As the host is optional, it can be omitted and the one from the first dfc.docbroker.host[] entry will be used instead. Ditto for the port.
Normally, upon returning from the invocation of the original tools, the former dfc.properties file is restored to its original content. However, if ––keep is mentioned, the rollback will not be performed and the modified file will replace the original file. The latter will still be there though but renamed to $DOCUMENTUM_SHARED/config/dfc.properties_saved_YY-MM-DD_HH:MI:SS so it will still be possible to manually roll back. ––keep is mostly useful in conjunction with ––append so that new docbrokers get permanently added to the configuration file.
Finally, when ––verbose is specified, the changes to the dfc.properties file will be sent to stdout; a diff of both the original and the new configuration file will also be shown, along with the final command-line used to invoke the selected original tool. This helps troubleshooting possible command-line parsing issues because, as it can be seen from the code, no extra-effort has been put into this section.

The Code

The script below shows a possible implementation:

#!/bin/bash
# Installation:
# it should not be called directly but through one of the aliases below for the standard tools instead:
# ln -s dctm-wrapper wiapi
# ln -s dctm-wrapper widql
# ln -s dctm-wrapper wdmawk
# where the initial w stands for wrapper;
# and then:
#    ./widql ...
# $DOCUMENTUM_SHARED must obviously exist;
# Since there is no \$DOCUMENTUM_SHARED in eCS ≥ 16.4, set it to $DOCUMENTUM as follows:
#    export DOCUMENTUM_SHARED=$DOCUMENTUM
# See Usage() for details;

Usage() {
   cat - >>EoU
./widql docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep]
./wiapi docbase[@[host][:port]] [other standard parameters] [--verbose] [--append] [--keep]
./wdmawk [-v] docbase[@[host][:port]] [dmawk -v parameters] [--verbose] [--append] [--keep]
E.g.:
   wiapi dmtest
or:
   widql dmtest@remote_host
or:
   widql dmtest@remote_host:1491 -Udmadmin -Pxxxx
or:
   wiapi dmtest@:1491 --append
or:
   wdmawk -v dmtest01@docker:5489 -f ./twdmawk.awk -v ...
or:
   wdmawk dmtest01@docker:2489 -f ./twdmawk.awk -v ...
or:
   wiapi dmtest@remote_host:1491 --append --keep
etc...
If --verbose is present, the changes applied to \$DOCUMENTUM[_SHARED]/config/dfc.properties are displayed.
If --append is present, a new entry is appended to the dfc.properties file, the value couple dfc.docbroker.host and dfc.docbroker.port, and the existing ones are not commented out so they are still usable;
If --append is not present, all the entries are removed prior to inserting the new one;
If --keep is present, the changed dfc.properties file is not reverted to the changed one, i.e. the changes are made permanent;
If a change of configuration has been requested, the original config file is first saved with a timestamp appended and restored on return from the standard tools, unless --keep is present in which case
the backup file is also kept so it is still possible to manually revert to the original configuration;
wdmawk invokes dmawk passing it the -v docbase=$docbase command-line parameter;
In addition, if -v docbase[@[host][:port]] is used, -v repo_target=docbase[@[host][:port]] is also passed to dmawk;
Instead of a in-line target definition, environment variables can also be used, e.g.:
   widql dmtest@$dmtestVM ...
where $dmtestVM resolves to e.g. docker:1489
or even:
   widql $test01c ...
where $test01c resolves to e.g. dmtest01@container01:1489
As the environment variable is resolved by the shell before it invokes the program, make sure it has a definition, e.g. source a configuration file;
EoU
   exit 0
}

if [[ $# -eq 0 ]]; then
   Usage
fi

# save command;
current_cmd="$0 $*"

# which original program shall possibly be called ?
dctm_program=$(basename $0); dctm_program=${dctm_program:1}
if [[ $dctm_program == "dmawk" ]]; then
   bFordmawk=1 
else
   bFordmawk=0 
fi

# look for the --verbose, --append or --keep options;
# remove them from the command-line if found so they are not passed to the standard Documentum's tools;
# the goal is to clean up the command-line from the enhancements options so it can be passed to the official tools;
bVerbose=0
bAppend=0
bKeep=0
posTarget=1
passTarget2awk=0
while true; do
   index=-1
   bChanged=0
   for i in "$@"; do
      (( index += 1 ))
      if [[ "$i" == "--verbose" ]]; then
         bVerbose=1
         bChanged=1
         break
      elif [[ "$i" == "--append" ]]; then
         bAppend=1
         bChanged=1
         break
      elif [[ "$i" == "--keep" ]]; then
         bKeep=1
         bChanged=1
         break
      elif [[ "$i" == "-v" && $bFordmawk -eq 1 && $index -eq 0 ]]; then
	 passTarget2awk=1
         bChanged=1
         break
      fi
   done
   if [[ $bChanged -eq 1 ]]; then
      set -- ${@:1:index} ${@:index+2:$#-index-1}
   else
      break
   fi
done

[[ bVerbose -eq 1 ]] && echo "current_cmd=[$current_cmd]"

target=$1
remote_info=$(echo $1 | gawk '{
   docbase = ""; hostname = ""; port = ""
   if (match($0, /@[^ \t:]*/)) {
      docbase = substr($0, 1, RSTART - 1)
      hostname = substr($0, RSTART + 1, RLENGTH - 1)
      rest = substr($0, RSTART + RLENGTH)
      if (1 == match(rest, /:[0-9]+/))
         port = substr(rest, 2, RLENGTH - 1)
   }
   else docbase = $0
}
END {
   printf("%s:%s:%s", docbase, hostname, port)
}')
docbase=$(echo $remote_info | cut -d: -f1)
hostname=$(echo $remote_info | cut -d: -f2)
port=$(echo $remote_info | cut -d: -f3)

# any modifications to the config file requested ?
if [[ ! -z $hostname || ! -z $port ]]; then
   # the dfc.properties file must be changed for the new target repository;
   dfc_config=$DOCUMENTUM_SHARED/config/dfc.properties
   if [[ ! -f $dfc_config ]]; then
      echo "$dfc_config not found"
      echo "check the \$DOCUMENTUM_SHARED environment variable"
      echo " in ≥ 16.4, set it to \$DOCUMENTUM"
      exit 1
   fi
   
   # save the current config file;
   backup_file=${dfc_config}_saved_$(date +"%Y-%m-%d_%H:%M:%S")
   cp $dfc_config ${backup_file}

   [[ $bVerbose -eq 1 ]] && echo "changing to $hostname:$port..."
   pid=$$; gawk -v hostname="$hostname" -v port="$port" -v bAppend=$bAppend -v bVerbose=$bVerbose -v bKeep=$bKeep -v pid=$$ 'BEGIN {
      bFirst_hostname = 0; first_hostname = ""
      bFirst_port     = 0 ;    first_port = ""
      max_index = -1
   }
   {
      if (match($0, /^dfc.docbroker.host\[[0-9]+\]=/)) {
         if (!hostname && !bFirst_hostname) {
            # save the first host name to be used if command-line hostname was omitted;
            bFirst_hostname = 1
            first_hostname = substr($0, RLENGTH +1)
         }
         match($0, /\[[0-9]+\]/); index_number = substr($0, RSTART + 1, RLENGTH - 2)
         if (bAppend) {
            # leave the entry;
            print $0
            if (index_number > max_index)
               max_index = index_number
         }
         else {
            # do not, which will remove the entry;
            if (bVerbose)
               print "# removed:", $0 > ("/tmp/tmp_" pid)
         }
      }
      else if (match($0, /^dfc.docbroker.port\[[0-9]+\]=/)) {
         if (!port && !bFirst_port) {
            # save the first port to be used if command-line port was omitted;
            bFirst_port = 1
            first_port = substr($0, RLENGTH +1)
         }
         if (bAppend)
            # leave the entry;
            print $0
         else {
            # do nothing, which will remove the entry;
            if (bVerbose)
               print "# removed:", $0 > ("/tmp/tmp_" pid)
         }
      }
      else print
   }
   END {
      if (!hostname)
         hostname = first_hostname
      if (!port)
         port = first_port
      if (bAppend)
         index_number = max_index + 1
      else
         index_number = 0
      print "dfc.docbroker.host[" index_number "]=" hostname
      print "dfc.docbroker.port[" index_number "]=" port
      if (bVerbose) {
         print "# added: dfc.docbroker.host[" index_number "]=" hostname > ("/tmp/tmp_" pid)
         print "# added: dfc.docbroker.port[" index_number "]=" port     > ("/tmp/tmp_" pid)
      }
      close("/tmp/tmp_" pid)
   }' $dfc_config > ${dfc_config}_new

   if [[ $bVerbose -eq 1 ]]; then
      echo "requested changes:"
      cat /tmp/tmp_$$
      rm /tmp/tmp_$$
      echo "diffs:"
      diff $dfc_config ${dfc_config}_new
   fi 

   mv ${dfc_config}_new $dfc_config
   shift

   if [[ $bFordmawk -eq 1 ]]; then
      docbase="-v docbase=$docbase"
      [[ $passTarget2awk -eq 1 ]] && docbase="-v repo_target=$target $docbase"
   fi
   [[ $bVerbose -eq 1 ]] && echo "calling original: $DM_HOME/bin/${dctm_program} $docbase $*"
   $DM_HOME/bin/${dctm_program} $docbase $*

   # restore original config file;
   [[ $bKeep -eq 0 ]] && mv ${backup_file} $dfc_config
else
   if [[ $bVerbose -eq 1 ]]; then
      echo "no change to current $dfc_config file"
      echo "calling original: $DM_HOME/bin/${dctm_program} $*"
   fi
   $DM_HOME/bin/${dctm_program} $*
fi

The original configuration file is always saved on entry by appending a timestamp precise to the second which, unless you’re the Flash running the command twice in the background with the option ––keep but without ––append, should be enough to preserve the original content.
To make the command-line parsing simpler, the script relies on the final invoked command for checking any syntax errors. Feel free to modify it and make it more robust if you need that. As said earlier, the ––verbose option can help troubleshooting unexpected results here.
See part II of this article for the tests.

Cet article Connecting to a Repository via a Dynamically Edited dfc.properties File (part I) est apparu en premier sur Blog dbi services.

Pages

Subscribe to Oracle FAQ aggregator