Feed aggregator

Monitor Elasticsearch Cluster with Zabbix

Yann Neuhaus - Fri, 2024-04-12 12:37

Setting up Zabbix monitoring over an Elasticsearch cluster is quiet easy as it does not require an agent install. As a matter a fact, the official template uses the Elastic REST API. Zabbix server itself will trigger these requests.

In this blog post, I will quick explain how to setup Elasticsearch cluster, then how easy the Zabbix setup is and list possible issues you might encounter.

Elastic Cluster Setup

I will not go too much in detail as David covered already many topics around ELK. Anyway, would you need any help to install, tune or monitor your ELK cluster fell free to contact us.

My 3 virtual machines are provisioned with YaK on OCI. Then, I install the rpm on all 3 nodes.

After starting first node service, I am generating an enrollment token with this command:

/usr/share/elasticsearch/bin/elasticsearch-create-enrollment-token -node

This return a long string which I will need to pass on node 2 and 3 of the cluster (before starting anything):

/usr/share/elasticsearch/bin/elasticsearch-reconfigure-node --enrollment-token <...>

Output will look like that:

This node will be reconfigured to join an existing cluster, using the enrollment token that you provided.
This operation will overwrite the existing configuration. Specifically:
  - Security auto configuration will be removed from elasticsearch.yml
  - The [certs] config directory will be removed
  - Security auto configuration related secure settings will be removed from the elasticsearch.keystore
Do you want to continue with the reconfiguration process [y/N]

After confirming with a y, we are almost ready to start. First, we must update ES configuration file (ie. /etc/elasticsearch/elasticsearch.yml).

  • Add IP of first node (only for first boot strapped) in cluster.initial_master_nodes: ["10.0.0.x"]
  • Set listening IP of the inter-node trafic (to do on node 1 as well): transport.host:
  • Set list of master eligible nodes: discovery.seed_hosts: ["10.0.0.x:9300"]

Now, we are ready to start node 2 and 3.

Let’s check the health of our cluster:

curl -s https://localhost:9200/_cluster/health -k -u elastic:password | jq

If you forgot elastic password, you can reset it with this command:

/usr/share/elasticsearch/bin/elasticsearch-reset-password -u elastic
Zabbix Configuration

With latest Elasticsearch release, security has drastically increased as SSL communication became the standard. Nevertheless, default MACROS values of the template did not. Thus, we have to customize the followings:

  • {$ELASTICSEARCH.PASSWORD} to its password

If SELinux is enabled on your Zabbix server, you will need to allow zabbix_server process to send network request. Following command achieves this:

setsebool zabbix_can_network 1

Next, we can create a host in Zabbix UI like that:

The Agent interface is required but will not be used to reach any agent as there are not agent based (passive or active) checks in the linked template. However, http checks uses HOST.CONN MACRO in the URLs. Ideally, the IP should be a virtual IP or a load balanced IP.

Don’t forget to set the MACROS:

After few minutes, and once nodes discovery ran, you should see something like that:


What will happen if one node stops? On Problems tab of Zabbix UI:

After few seconds, I noticed that ES: Health is YELLOW gets resolved on its own. Why? Because shards are re-balanced across running servers.

I confirm this by graphing Number of unassigned shards:

We can also see the re-balancing with the network traffic monitoring:

Received bytes on the left. Sent on the right.

Around 15:24, I stopped node 3 and shards were redistributed from node 1 and 2.

When node 3 start, at 15:56, we can see node 1 and 2 (20 Mbps each) send back shards to node 3 (40 Mbps received).


Whatever the monitoring tool you are using, it always help to understand what is happening behind the scene.

L’article Monitor Elasticsearch Cluster with Zabbix est apparu en premier sur dbi Blog.

Power BI Report Server: unable to publish a PBIX report

Yann Neuhaus - Fri, 2024-04-12 12:01

I installed a complete new Power BI Report Server. The server had several network interfaces to be part of several subdomains. In order to access the Power BI Report Server web portal from the different subdomains I defined 3 different HTTPS URL’s in the configuration file and a certificate binding. I used as well a specific active directory service account to start the service. I restarted my Power BI Report Server service checking that the URL reservations were done correctly. I knew that in the past this part could be a source of problems.

Everything seemed to be OK. I tested the accessibility to the Power BI Report Server web portal from the different sub-nets  clients and everything was fine.

The next test was the upload of a Power BI report to the web portal. Of course I was sure, having a reports developed with Power BI Desktop RS.

Error raised

An error was raised when uploading a Power BI report in the web portal.

Trying to publish the report from Power BI Desktop RS was failing as well.


Report Server log analysis:

I started to analyze the Power BI Report Server logs. For a standard installation they are located in

C:\Program Files\Microsoft Power BI Report Server\PBIRS\LogFiles

In the last RSPowerBI_yyyy_mm_dd_hh_mi_ss.log file written I could find the following error:

Could not start PBIXSystem.Reflection.TargetInvocationException: Exception has been thrown by the target of an invocation. ---> System.Net.HttpListenerException: Access is denied

The error showing an Access denied, the first reaction was to put the service account I used to start the Power BI Report Server in the local Administrators group.

I restarted the service and tried again the publishing of the Power BI report. It worked without issue.

Well, I had a solution, but the it wasn’t an acceptable one. A application service account should not be local admin of a server, it would be a security breach and is not permitted by the security governance.

Based on the information contained in the error message, I could find that is was related to URL reservation, but from the configuration steps, I could not notice any issues.

I analyzed than the list of the reserved URL on the server. Run the following command with elevated permissions to get the list of URL reservation on the server:

Netsh http show urlacl

List of URL reservation found for the user NT SERVICE\PowerBIReportServer:

    Reserved URL            : http://+:8083/ 
        User: NT SERVICE\PowerBIReportServer
            Listen: Yes
            Delegate: No
            SDDL: D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663) 
    Reserved URL            : https://sub1.domain.com:443/ReportServerPBIRS/ 
        User: NT SERVICE\PowerBIReportServer
            Listen: Yes
            Delegate: No
            SDDL: D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663) 
    Reserved URL            : https://sub2.domain.com:443/ReportServerPBIRS/ 
        User: NT SERVICE\PowerBIReportServer
            Listen: Yes
            Delegate: No
            SDDL: D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663) 
    Reserved URL            : https://servername.domain.com.ch:443/PowerBI/ 
        User: NT SERVICE\PowerBIReportServer
            Listen: Yes
            Delegate: No
            SDDL: D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663) 
    Reserved URL            : https://servername.domain.com.ch:443/wopi/ 
        User: NT SERVICE\PowerBIReportServer
            Listen: Yes
            Delegate: No
            SDDL: D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663) 
    Reserved URL            : https://sub1.domain.com:443/ReportsPBIRS/ 
        User: NT SERVICE\PowerBIReportServer
            Listen: Yes
            Delegate: No
            SDDL: D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663) 
    Reserved URL            : https://sub2.domain.com:443/ReportsPBIRS/ 
        User: NT SERVICE\PowerBIReportServer
            Listen: Yes
            Delegate: No
            SDDL: D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663) 
    Reserved URL            : https://servername.domain.com.ch:443/ReportsPBIRS/ 
        User: NT SERVICE\PowerBIReportServer
            Listen: Yes
            Delegate: No
            SDDL: D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663) 

Checking the list I could find:

  • the 3 URL’s reserved fro the web service containing the virtual directory I defined ReportServerPBIRS
  •  the 3 URL’s reserved fro the web portal containing the virtual directory I defined ReportsPBIRS

But I noticed that only 1 URL was reserved for the virtual directories PowerBI and wopi containing the servername.

The 2 others with the subdomains were missing.


I decided to reserve the URL for PowerBI and wopi virtual directory on the 2 subdomains running the following command with elevated permissions.
Be sure that the SDDL ID used is the one you find in the rsreportserver.config file.

netsh http add urlacl URL=sub1.domain.com:443/PowerBI/ user="NT SERVICE\SQLServerReportingServices" SDDL="D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663)"
netsh http add urlacl URL=sub2.domain.com:443/PowerBI/ user="NT SERVICE\SQLServerReportingServices" SDDL="D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663)"
netsh http add urlacl URL=sub1.domain.com:443/wopi/ user="NT SERVICE\SQLServerReportingServices" SDDL="D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663)"
netsh http add urlacl URL=sub2.domain.com:443/wopi// user="NT SERVICE\SQLServerReportingServices" SDDL="D:(A;;GX;;;S-1-5-80-1730998386-2757299892-37364343-1607169425-3512908663)"

Restart the Power BI Report Server service

You can notice that the error in the latest RSPowerBI_yyyy_mm_dd_hh_mi_ss.log file desappeared.

I tested the publishing of a Power BI report again, and it worked.
I hope that this reading has helped to solve your problem.

L’article Power BI Report Server: unable to publish a PBIX report est apparu en premier sur dbi Blog.

Configuring Shared Global Area (SGA) in a Multitenant Database with a PeopleSoft Pluggable Database (PDB)

David Kurtz - Thu, 2024-04-11 10:06

I have been working on a PeopleSoft Financials application that we have converted from a stand-alone database to be the only pluggable database (PDB) in an Oracle 19c container database (CDB).  We have been getting shared pool errors in the PDB that lead to ORA-4031 errors in the PeopleSoft application.  

I have written a longer version of this article on my Oracle blog, but here are the main points.

SGA Management with a Parse Intensive System (PeopleSoft).

PeopleSoft systems dynamically generate lots of non-shareable SQL code.  This leads to lots of parse and consumes more shared pool.  ASMM can respond by shrinking the buffer cache and growing the shared pool.  However, this can lead to more physical I/O and degrade performance and it is not beneficial for the database to cache dynamic SQL statements that are not going to be executed again.  Other parse-intensive systems can also exhibit this behaviour.

In PeopleSoft, I normally set DB_CACHE_SIZE and SHARED_POOL_SIZE to minimum values to stop ASMM shuffling too far in either direction.  With a large SGA, moving memory between these pools can become a performance problem in its own right.  

We removed SHARED_POOL_SIZE, DB_CACHE_SIZE and SGA_MIN_SIZE settings from the PDB.  The only SGA parameters set at PDB level are SGA_TARGET and INMEMORY_SIZE.  

SHARED_POOL_SIZE and DB_CACHE_SIZE are set as I usually would for PeopleSoft, but at CDB level to guarantee a minimum buffer cache size.  

This is straightforward when there is only one PDB in the CDB.   I have yet to see what happens when I have another active PDB with a non-PeopleSoft system and a different kind of workload that puts less stress on the shared pool and more on the buffer cache.

Initialisation Parameters
  • SGA_TARGET "specifies the total size of all SGA components".  Use this parameter to control the memory usage of each PDB.  The setting at CDB must be at least the sum of the settings for each PDB.
    • Recommendations:
      • Use only this parameter at PDB level to manage the memory consumption of the PDB.
      • In a CDB with only a single PDB, set SGA_TARGET to the same value at CDB and PDB levels.  
      • Therefore, where there are multiple PDBs, SGA_TARGET at CDB level should be set to the sum of the setting for each PDB.  However, I haven't tested this yet.
      • There is no recommendation to reserve SGA for use by the CDB only, nor in my experience is there any need so to do.
  • SHARED_POOL_SIZE sets the minimum amount of shared memory reserved to the shared pool.  It can optionally be set in a PDB.  
    • Recommendation: However, do not set SHARED_POOL_SIZE at PDB level.  It can be set at CDB level.
  • DB_CACHE_SIZE sets the minimum amount of shared memory reserved to the buffer cache. It can optionally be set in a PDB.  
    • Recommendation: However, do not set DB_CACHE_SIZE at PDB level.  It can be set at CDB level.
  • SGA_MIN_SIZE has no effect at CDB level.  It can be set at PDB level at up to half of the manageable SGA
    • Recommendation: However, do not set SGA_MIN_SIZE.
  • INMEMORY_SIZE: If you are using in-memory query, this must be set at CDB level in order to reserve memory for the in-memory store.  The parameter defaults to 0, in which case in-memory query is not available.  The in-memory pool is not managed by Automatic Shared Memory Management (ASMM), but it does count toward the total SGA used in SGA_TARGET.
    • Recommendation: Therefore it must also be set in the PDB where in-memory is being used, otherwise we found(contrary to the documetntation) that the parameter defaults to 0, and in-memory query will be disabled in that PDB.
Oracle Notes
  • About memory configuration parameter on each PDBs (Doc ID 2655314.1) – Nov 2023
    • As a best practice, please do not to set SHARED_POOL_SIZE and DB_CACHE_SIZE on each PDBs and please manage automatically by setting SGA_TARGET.
    • "This best practice is confirmed by development in Bug 30692720"
    • Bug 30692720 discusses how the parameters are validated.  Eg. "Sum(PDB sga size) > CDB sga size"
    • Bug 34079542: "Unset sga_min_size parameter in PDB."

Zero-Sum Game

Michael Dinh - Thu, 2024-04-11 07:11

Zero-sum is a situation, often cited in game theory, in which one person’s gain is equivalent to another’s loss, so the net change in wealth or benefit is zero.

A zero-sum game may have as few as two players or as many as millions of participants.

Reference: Zero-Sum Game Definition in Finance, With Example

This is what the database team is facing because system team is upgrading operating system to Red Hat Enterprise Linux 7 which will reach its end of life on June 30, 2024 (2 months later).

Here are some Oracle Docs with may be helpful.

IMPORTANT: Need to relink GI and DB

There is one step missing and do you know what it is? Keep in mind, not all environments are the same.

How To Relink The Oracle Grid Infrastructure Standalone (Restart) Installation Or
Oracle Grid InfrastructureRAC/ClusterInstallation (11.2 to 21c).
Doc ID 1536057.1

Relinking Oracle Home FAQ ( Frequently Asked Questions)
Doc ID 1467060.1

Executing “relink all” resets permission of extjob, jssu, oradism, externaljob.ora
Oracle Database – Enterprise Edition – Version and later
Doc ID 1555453.1

PostgreSQL 17: pg_buffercache_evict()

Yann Neuhaus - Thu, 2024-04-11 02:49

In PostgreSQL up to version 16, there is no way to evict the buffer cache except by restarting the instance. In Oracle you can do that since ages with “alter system flush buffer cache“, but not in PostgreSQL. This will change when PostgreSQL 17 will be released later this year. Of course, flushing the buffer cache is nothing you’d usually like to do in production, but this can be very handy for educational or debugging purposes. This is also the reason why this is intended to be a developer feature.

For getting access to the pg_buffercache_evict function you need to install the pg_buffercache extension as the function is designed to work over the pg_buffercache view:

postgres=# select version();
 PostgreSQL 17devel on x86_64-linux, compiled by gcc-7.5.0, 64-bit
(1 row)

postgres=# create extension pg_buffercache;
postgres=# \dx
                      List of installed extensions
      Name      | Version |   Schema   |           Description           
 pg_buffercache | 1.5     | public     | examine the shared buffer cache
 plpgsql        | 1.0     | pg_catalog | PL/pgSQL procedural language
(2 rows)
postgres=# \d pg_buffercache 
                 View "public.pg_buffercache"
      Column      |   Type   | Collation | Nullable | Default 
 bufferid         | integer  |           |          | 
 relfilenode      | oid      |           |          | 
 reltablespace    | oid      |           |          | 
 reldatabase      | oid      |           |          | 
 relforknumber    | smallint |           |          | 
 relblocknumber   | bigint   |           |          | 
 isdirty          | boolean  |           |          | 
 usagecount       | smallint |           |          | 
 pinning_backends | integer  |           |          | 

Once the extension is in place, the function is there as well:

postgres=# \dfS *evict*
                               List of functions
 Schema |         Name         | Result data type | Argument data types | Type 
 public | pg_buffercache_evict | boolean          | integer             | func
(1 row)

To load something into the buffer cache we’ll make use of the pre_warm extension and completely load the table we’ll create afterwards:

postgres=# create extension pg_prewarm;
postgres=# create table t ( a int, b text );
postgres=# insert into t select i, i::text from generate_series(1,10000) i;
INSERT 0 10000
postgres=# select pg_prewarm ( 't', 'buffer', 'main', null, null );
(1 row)
postgres=# select pg_relation_filepath('t');
(1 row)
postgres=# select count(*) from pg_buffercache where relfilenode = 16401;
(1 row)

If you wonder why there are 58 blocks cached in the buffer cache but we only loaded 54, this is because of the visibility and free space map:

postgres=# select relforknumber from pg_buffercache where relfilenode = 16401 and relforknumber != 0;
(4 rows)

Using the new pg_buffercache_evict() function we are now able to completely evict the buffers of that table from the cache, which results in exactly 58 blocks to be evicted:

postgres=# select pg_buffercache_evict(bufferid) from pg_buffercache where relfilenode = 16401;
(58 rows)

Cross-checking this confirms, that all the blocks are gone:

postgres=# select count(*) from pg_buffercache where relfilenode = 16401;
(1 row)

Nice, thanks to all involved.

L’article PostgreSQL 17: pg_buffercache_evict() est apparu en premier sur dbi Blog.

Video on Client Side Failover in RAC

Hemant K Chitale - Thu, 2024-04-11 01:54

 I've posted a new video demonstrating Client Side Failover defined by the tnsnames.ora file

Categories: DBA Blogs

Catastrophic Database Failure -- Deletion of Control and Redo Files

Tom Kyte - Wed, 2024-04-10 14:26
We recently had a database failure that resulted in data loss after an Oracle database had both both its control, and redo log files deleted. Please note that I am not a DBA, but simply an analyst that supports the system that sits on this Oracle database. Any amount of data loss is fairly serious, and I am wondering how we avoid this in the future. Before the control, and redo files were deleted, we had an event wherein the drive this database is on was full. This caused the database stop writing transactions, and disallowed users from accessing the application. Once space was made on this drive, the database operated normally for several hours until...the redo, and control files were deleted. What would have caused the control, and redo files to be deleted? In trying to figure out what happened, it was noted that if we had expanded the drive's memory in response to its becoming full, the later data loss would not have happened. Does Tom agree with that sentiment? Are these two events linked (disk drive nearly full and later data loss), or are they symptomatic of two different things?
Categories: DBA Blogs


Jonathan Lewis - Wed, 2024-04-10 03:17

What to do when you hit a problem (possibly after an incomplete recovery) that reports an “ORA-00001 unique key violation” on sys.wrm$_snapshot_pk – as reported recently in this thread on the MOSC SQL Performance forum (needs a MOS account.)

Snapshot ids are carefully sequenced, without gaps, so somehow the thing that controls the “current” sequence number has gone backwards and is trying to generate a value that is lower than the current highest value in wrm$_snapshot. The thread I referenced above does point to an article dated 2017 on Alibaba discussing methods of checking for corruption and clearing up messes; but as an extra option you could simply try hacking the control table to set the “last used” snapshot id so something higher than the highest value currently in wrm$_snapshot. The table you need to hack is wrm$_wr_control and here’s an example of its contents from an instance of 19.11 (preceded by a check of the current maximum snap_id in wrm$_snapshot):

SQL> select max(snap_id) max_snap_id, max(end_interval_time) max_snap_time from wrm$_snapshot;

----------- ---------------------------------------------------------------------------
       7304 09-APR-24 PM

SQL> execute print_table('select * from wrm$_wr_control')
DBID                           : 3158514872
SNAP_INTERVAL                  : +00000 01:00:00.0
SNAPINT_NUM                    : 3600
RETENTION                      : +00008 00:00:00.0
RETENTION_NUM                  : 691200
MOST_RECENT_SNAP_ID            : 7304
MRCT_SNAP_TIME_NUM             : 1712685600
STATUS_FLAG                    : 2
MRCT_PURGE_TIME_NUM            : 1712648156
MOST_RECENT_SPLIT_ID           : 7295
MOST_RECENT_SPLIT_TIME         : 1712648156
SWRF_VERSION                   : 30
MRCT_BASELINE_ID               : 0
TOPNSQL                        : 2000000000
MRCT_BLTMPL_ID                 : 0
SRC_DBID                       : 3158514872
SRC_DBNAME                     : CDB$ROOT
T2S_DBLINK                     :
FLUSH_TYPE                     : 0
SNAP_ALIGN                     : 0
MRCT_SNAP_STEP_TM              : 1712685613
MRCT_SNAP_STEP_ID              : 0
1 row(s) selected

PL/SQL procedure successfully completed.

I have to state that I would not try this on a production system without getting permission – possibly written in blood – from Oracle support: but if I were really desperate to keep existing snapshots and to make it possible for Oracle to generate new snapshots as soon as possible I might try updating the most_recent_snap_id to the value shown in wrm$_snapshot.

Of course you ought to update the most_recent_snap_time as well, and the mrct_snap_time_num (which looks like the number of seconds since 1st Jan 1900 GMT (on my instance)).

Then there’s the interpretation and sanity checking of the other “most recent / mrct” columns to worry about, and the possibility of PDBs vs. CDBs – but those are topics that I’m going to leave to someone else to worry about.

Is 'SELECT * FROM :TABLE_NAME;' available?

Tom Kyte - Tue, 2024-04-09 20:06
Is 'SELECT * FROM :TABLE_NAME;' available?
Categories: DBA Blogs

Apache httpd Tuning and Monitoring with Zabbix

Yann Neuhaus - Tue, 2024-04-09 07:37

There is no tuning possible without a proper monitoring in place to measure the impact of any changes. Thus, before trying to tune an Apache httpd server, I will explain how to monitor it with Zabbix.

Setup Zabbix Monitoring

Apache httpd template provided by Zabbix uses mod_status which provides metrics about load, processes and connections.

Before enabling this module, we must ensure it is present. httpd -M 2>/dev/null | grep status_module command will tell you so. Next, we can extend configuration by creating a file in /etc/httpd/conf.d:

<Location "/server-status">
  SetHandler server-status

After a configuration reload, we should be able to access the URL http://<IP>/server-status?auto.

Finally, we can link the template to the host and see that data are collected:


I deployed a simple static web site to the Apache httpd server. To load test that web site, nothing better than JMeter. The load test scenario is simply requesting Home, About, Services and Contact Us pages and retrieve all embedded resources during 2 minutes with 100 threads (ie. users).

Here are the performances on requests per seconds (right scale) and bytes per seconds (left scale):

At most, server serves 560 req/s at 35 MBps.

And regarding CPU usage, it almost reaches 10%:


Without any additional headers, Apache httpd will consider the client (here JMeter) does not support gzip. Fortunately, it is possible to set HTTP Header in JMeter. I add it at the top of the test plan so that it will apply to all HTTP Requests below:

Note that I enabled mod_deflate on Apache side.

Let’s run another load test and compare the results!

After two minutes, here is what I see:

The amount of Mbps reduced to 32 which is expected as we are compressing. The amount of req/s increased by almost 100% to 1000 req/s !

On the CPU side, we also see a huge increase:

45% CPU usage

This is also more or else expected as compression requires computing.

And Now

The deployed static web site does not have any forms which would require client side compression. That will be a subject for another blog. Also, I can compare with Nginx.

L’article Apache httpd Tuning and Monitoring with Zabbix est apparu en premier sur dbi Blog.

ODA X10-L storage configuration is different from what you may expect

Yann Neuhaus - Tue, 2024-04-09 04:39

Oracle Database Appliance X10 lineup is available since September 2023. Compared to X9-2 lineup, biggest changes are the AMD Epyc processors replacing Intel Xeons, and new license model regarding Standard Edition 2, clarified by Oracle several weeks ago. Apart from these new things, the models are rather similar to previous ones, with the Small model for basic needs, a HA model with RAC and high-capacity storage for big critical databases, and a much more popular Large model for most of the needs.

2 kinds of disks inside the ODA X10-L

The ODA I’ve worked on is a X10-L with 2x disk expansions, meaning that I have the 2x 6.8TB disks from the base configuration, plus 4x 6.8TB additional disks. The first 4 disks are classic disks visible on the front panel of the ODA. As there are only 4 bays in the front, the other disks are internal, called AIC for Add-In Card (PCIe). You can have up to 3 disk expansions, meaning 4x disks in the front and 4x AIC disks inside the server. You should know that only the front disks are hot swappable. The other disks being PCIe cards, you will need to shut down the server and open its cover to remove, add or replace a disk. 6.8TB is the RAW capacity, consider that real capacity is something like 6.2TB, but usable capacity will be lower as you will need to use ASM redundancy to protect your blocks. In the ODA documentation, you will find the usable capacity for each disk configuration.

2 AIC disks inside an ODA X10-L. The first 4 disks are in the front.

First contact with X10-L – using odacli

odacli describe-system is very useful for an overview of the ODA you’re connected to:

odacli describe-system
Appliance Information
                     ID: 3fcd1093-ea74-4f41-baa1-f325b469a3e1
               Platform: X10-2L
        Data Disk Count: 10
         CPU Core Count: 4
                Created: January 10, 2024 2:26:43 PM CET

System Information
                   Name: dc1oda002
            Domain Name: ad.dbiblogs.ch
              Time Zone: Europe/Zurich
             DB Edition: EE
            DNS Servers:
            NTP Servers:

Disk Group Information
DG Name                   Redundancy                Percentage
------------------------- ------------------------- ------------
DATA                      NORMAL                    85
RECO                      NORMAL                    15

Data Disk Count is not what I’ve expected. This is normally the number of DATA disks, it should be 6 on this ODA, not 10.

Let’s do a show disk with odaadmcli:

odaadmcli show disk
        NAME            PATH            TYPE            STATE           STATE_DETAILS
        pd_00           /dev/nvme0n1    NVD             ONLINE          Good
        pd_01           /dev/nvme1n1    NVD             ONLINE          Good
        pd_02           /dev/nvme3n1    NVD             ONLINE          Good
        pd_03           /dev/nvme2n1    NVD             ONLINE          Good
        pd_04_c1        /dev/nvme8n1    NVD             ONLINE          Good
        pd_04_c2        /dev/nvme9n1    NVD             ONLINE          Good
        pd_05_c1        /dev/nvme6n1    NVD             ONLINE          Good
        pd_05_c2        /dev/nvme7n1    NVD             ONLINE          Good

OK, this command only displays the DATA disks, so the system disks are not in this list, but there are still 8 disks and not 6.

Let’s have a look on the system side.

First contact with X10-L – using system commands

What is detected by the OS?

lsblk | grep disk
nvme9n1                      259:0       0   3.1T  0 disk
nvme6n1                      259:6       0   3.1T  0 disk
nvme8n1                      259:12      0   3.1T  0 disk
nvme7n1                      259:18      0   3.1T  0 disk
nvme4n1                      259:24      0 447.1G  0 disk
nvme5n1                      259:25      0 447.1G  0 disk
nvme3n1                      259:26      0   6.2T  0 disk
nvme0n1                      259:27      0   6.2T  0 disk
nvme1n1                      259:28      0   6.2T  0 disk
nvme2n1                      259:29      0   6.2T  0 disk
asm/acfsclone-242            250:123905  0   150G  0 disk  /opt/oracle/oak/pkgrepos/orapkgs/clones
asm/commonstore-242          250:123906  0     5G  0 disk  /opt/oracle/dcs/commonstore
asm/odabase_n0-242           250:123907  0    40G  0 disk  /u01/app/odaorabase0
asm/orahome_sh-242           250:123908  0    80G  0 disk  /u01/app/odaorahome

This is rather strange. I can see 10 disks, the 2x 450GB disks are for the system (and normally not considered as DATA disks by odacli), I can also find 4x 6.2TB disks. But instead of having 2x additional 6.2TB disks, I have 4x 3.1TB disks. The overall capacity is OK, 37.2TB, but it’s different compared to previous ODA generations.

Let’s confirm this with fdisk:

fdisk -l /dev/nvme0n1
Disk /dev/nvme0n1: 6.2 TiB, 6801330364416 bytes, 13283848368 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: E20D9013-1982-4F66-B7A2-5FE0B1BC8F74
Device                Start         End    Sectors   Size Type
/dev/nvme0n1p1         4096  1328386047 1328381952 633.4G Linux filesystem
/dev/nvme0n1p2   1328386048  2656767999 1328381952 633.4G Linux filesystem
/dev/nvme0n1p3   2656768000  3985149951 1328381952 633.4G Linux filesystem
/dev/nvme0n1p4   3985149952  5313531903 1328381952 633.4G Linux filesystem
/dev/nvme0n1p5   5313531904  6641913855 1328381952 633.4G Linux filesystem
/dev/nvme0n1p6   6641913856  7970295807 1328381952 633.4G Linux filesystem
/dev/nvme0n1p7   7970295808  9298677759 1328381952 633.4G Linux filesystem
/dev/nvme0n1p8   9298677760 10627059711 1328381952 633.4G Linux filesystem
/dev/nvme0n1p9  10627059712 11955441663 1328381952 633.4G Linux filesystem
/dev/nvme0n1p10 11955441664 13283823615 1328381952 633.4G Linux filesystem

fdisk -l /dev/nvme8n1
Disk /dev/nvme8n1: 3.1 TiB, 3400670601216 bytes, 6641934768 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: gpt
Disk identifier: A3086CB0-31EE-4F78-A6A6-47D53149FDAE

Device              Start        End    Sectors   Size Type
/dev/nvme8n1p1       4096 1328386047 1328381952 633.4G Linux filesystem
/dev/nvme8n1p2 1328386048 2656767999 1328381952 633.4G Linux filesystem
/dev/nvme8n1p3 2656768000 3985149951 1328381952 633.4G Linux filesystem
/dev/nvme8n1p4 3985149952 5313531903 1328381952 633.4G Linux filesystem
/dev/nvme8n1p5 5313531904 6641913855 1328381952 633.4G Linux filesystem

OK, the 6.2TB disks are split in 10 partitions, and the 3.1TB disks are split in 5 partitions. It makes sense because ASM needs partitions of the same size inside a diskgroup.

First contact with X10-L – using ASM

Now let’s have a look within ASM, the most important thing being that ASM is able to manage the storage correctly:

su - grid
sqlplus / as sysasm
select a.name "Diskgroup" , round(a.total_mb/1024) "Size GB",  round(a.free_mb/1024) "Free GB",   round(a.usable_file_mb/1024) "Usable GB", 100*round((a.total_mb-a.free_mb)/a.total_mb,1) "Use%" from  v$asm_diskgroup a ;
Diskgroup                         Size GB    Free GB  Usable GB       Use%
------------------------------ ---------- ---------- ---------- ----------
DATA                                30404      12341       3637         60
RECO                                 7601       5672       2203         30

select name,total_mb/1024 "GB", GROUP_NUMBER from v$asm_disk order by 3 desc;
NAME                                   GB GROUP_NUMBER
------------------------------ ---------- ------------
NVD_S02_S6UENA0W1072P9         633.421875            2
NVD_S02_S6UENA0W1072P10        633.421875            2
NVD_S05_C2_PHAZ25110P9         633.421875            2
NVD_S05_C2_PHAZ25110P10        633.421875            2
NVD_S03_S6UENA0W1073P10        633.421875            2
NVD_S00_S6UENA0W1075P10        633.421875            2
NVD_S01_S6UENA0W1072P10        633.421875            2
NVD_S00_S6UENA0W1075P9         633.421875            2
NVD_S01_S6UENA0W1072P9         633.421875            2
NVD_S04_C2_PHAZ24710P10        633.421875            2
NVD_S04_C2_PHAZ24710P9         633.421875            2
NVD_S03_S6UENA0W1073P9         633.421875            2
NVD_S05_C1_PHAZ25110P2         633.421875            1
NVD_S00_S6UENA0W1075P8         633.421875            1
NVD_S04_C2_PHAZ24710P6         633.421875            1
NVD_S00_S6UENA0W1075P6         633.421875            1
NVD_S02_S6UENA0W1072P1         633.421875            1
NVD_S05_C1_PHAZ25110P3         633.421875            1
NVD_S04_C1_PHAZ24710P2         633.421875            1
NVD_S03_S6UENA0W1073P4         633.421875            1
NVD_S00_S6UENA0W1075P2         633.421875            1
NVD_S02_S6UENA0W1072P6         633.421875            1
NVD_S05_C2_PHAZ25110P6         633.421875            1
NVD_S00_S6UENA0W1075P7         633.421875            1
NVD_S00_S6UENA0W1075P5         633.421875            1
NVD_S04_C2_PHAZ24710P7         633.421875            1
NVD_S04_C2_PHAZ24710P8         633.421875            1
NVD_S02_S6UENA0W1072P4         633.421875            1
NVD_S03_S6UENA0W1073P7         633.421875            1
NVD_S00_S6UENA0W1075P1         633.421875            1
NVD_S04_C1_PHAZ24710P1         633.421875            1
NVD_S01_S6UENA0W1072P2         633.421875            1
NVD_S01_S6UENA0W1072P1         633.421875            1
NVD_S01_S6UENA0W1072P3         633.421875            1
NVD_S03_S6UENA0W1073P5         633.421875            1
NVD_S01_S6UENA0W1072P4         633.421875            1
NVD_S02_S6UENA0W1072P3         633.421875            1
NVD_S01_S6UENA0W1072P7         633.421875            1
NVD_S02_S6UENA0W1072P7         633.421875            1
NVD_S05_C1_PHAZ25110P1         633.421875            1
NVD_S03_S6UENA0W1073P1         633.421875            1
NVD_S01_S6UENA0W1072P5         633.421875            1
NVD_S05_C1_PHAZ25110P5         633.421875            1
NVD_S04_C1_PHAZ24710P3         633.421875            1
NVD_S00_S6UENA0W1075P3         633.421875            1
NVD_S03_S6UENA0W1073P6         633.421875            1
NVD_S02_S6UENA0W1072P8         633.421875            1
NVD_S00_S6UENA0W1075P4         633.421875            1
NVD_S04_C1_PHAZ24710P5         633.421875            1
NVD_S01_S6UENA0W1072P8         633.421875            1
NVD_S04_C1_PHAZ24710P4         633.421875            1
NVD_S03_S6UENA0W1073P3         633.421875            1
NVD_S05_C1_PHAZ25110P4         633.421875            1
NVD_S03_S6UENA0W1073P2         633.421875            1
NVD_S01_S6UENA0W1072P6         633.421875            1
NVD_S05_C2_PHAZ25110P8         633.421875            1
NVD_S02_S6UENA0W1072P5         633.421875            1
NVD_S02_S6UENA0W1072P2         633.421875            1
NVD_S03_S6UENA0W1073P8         633.421875            1
NVD_S05_C2_PHAZ25110P7         633.421875            1

This is also different from older generations. On X9-2 lineup, the RECO diskgroup (group 2) has as many partitions as diskgroup DATA (group 1). This is no more the way it works. But regarding the free and usable GB, everything is fine.

Why Oracle did differently from older ODAs?

Oldest ODAs were using spinning disks, and for maximizing performance, Oracle created 2 partitions on each disk: one for DATA on the external part of the disk, and one for RECO on the internal part of the disk. All the DATA partitions were then added to the DATA diskgroup, and all RECO partitions to the RECO diskgroup. The more disks you had, the faster the read and write speed was. Redologs had their own dedicated disks, usually 4 small SSDs using high redundancy.

Nowadays, ODAs are mostly using SSDs, and read and write speed is identical wherever the block is. And the number of disks doesn’t matter, speed is mostly limited by the PCIe bandwidth and chips on the SSDs, but as far as I know, the speed of one NVMe SSD is enough for 95% of the databases.

Internal AIC disks on ODA X10-L are split in two disks for some reasons, so it’s not possible anymore to have the big partitions we had before.


This X10-L was initially deployed using version 19.21. As it wasn’t already in use, and to make sure everything is fine on the hardware and software side, it was decided to do a fresh reimaging using latest 19.22. It didn’t change anything, odacli still sees 10 disks, but apart from that, everything is fine.

Disk size and partitioning is now different, but it won’t change anything for most of us.

L’article ODA X10-L storage configuration is different from what you may expect est apparu en premier sur dbi Blog.

PostgreSQL 17: Split and Merge partitions

Yann Neuhaus - Tue, 2024-04-09 01:37

Since declarative partitioning was introduced in PostgreSQL 10 there have been several additions and enhancements throughout the PostgreSQL releases. PostgreSQL 17, expected to be released around September/October this year, is no exception to that and will come with two new features when it comes to partitioning: Splitting and Merging partitions.

Before we can have a look at that, we need a partitioned table, some partitions and some data, so lets generate this. Splitting and Merging works for range and list partitioning and because most of the examples for partitioning you can find online go for range partitioning, we’ll go for list partitioning in this post:

postgres=# create table t ( a int, b text ) partition by list (b);
postgres=# \d+ t
Partitioned table "public.t"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
a | integer | | | | plain | | |
b | text | | | | extended | | |
Partition key: LIST (b)
Number of partitions: 0

postgres=# create table t_p1 partition of t for values in ('a');
postgres=# create table t_p2 partition of t for values in ('b');
postgres=# create table t_p3 partition of t for values in ('c');
postgres=# create table t_p4 partition of t for values in ('d');
postgres=# \d+ t
                                      Partitioned table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
 a      | integer |           |          |         | plain    |             |              | 
 b      | text    |           |          |         | extended |             |              | 
Partition key: LIST (b)
Partitions: t_p1 FOR VALUES IN ('a'),
            t_p2 FOR VALUES IN ('b'),
            t_p3 FOR VALUES IN ('c'),
            t_p4 FOR VALUES IN ('d')

This gives us a simple list partitioned table and four partitions. Lets add some data to the partitions:

postgres=# insert into t select i, 'a' from generate_series(1,100) i;
INSERT 0 100
postgres=# insert into t select i, 'b' from generate_series(101,200) i;
INSERT 0 100
postgres=# insert into t select i, 'c' from generate_series(201,300) i;
INSERT 0 100
postgres=# insert into t select i, 'd' from generate_series(301,400) i;
INSERT 0 100
postgres=# select count(*) from t_p1;
(1 row)

postgres=# select count(*) from t_p2;
(1 row)

postgres=# select count(*) from t_p3;
(1 row)

postgres=# select count(*) from t_p4;
(1 row)

Suppose we want to merge the first two partitions, containing values of ‘a’ and ‘b’. This can now be easily done with the new merge partition DDL command:

postgres=# alter table t merge partitions (t_p1, t_p2) into t_p12;
postgres=# \d+ t
                                      Partitioned table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage  | Compression | Stats target | Description 
 a      | integer |           |          |         | plain    |             |              | 
 b      | text    |           |          |         | extended |             |              | 
Partition key: LIST (b)
Partitions: t_p12 FOR VALUES IN ('a', 'b'),
            t_p3 FOR VALUES IN ('c'),
            t_p4 FOR VALUES IN ('d')

The same the other way around: Splitting the new combined partition into single partitions:

postgres=# alter table t split partition t_p12 into ( partition t_p1 for values in ('a'), partition t_p2 for values in ('b'));
postgres=# \d+ t
                                      Partitioned table "public.t"
 Column |  Type   | Collation | Nullable | Default | Storage  | Comp>
 a      | integer |           |          |         | plain    |     >
 b      | text    |           |          |         | extended |     >
Partition key: LIST (b)
Partitions: t_p1 FOR VALUES IN ('a'),
            t_p2 FOR VALUES IN ('b'),
            t_p3 FOR VALUES IN ('c'),
            t_p4 FOR VALUES IN ('d')

Nice, but there currently is a downside with this: Both operations will take an “ACCESS EXCLUSIVE LOCK” on the parent table, so everything against that table will be blocked for the time it takes to either split or merge the partitions. I am not to worried about that, as this was the same with other features related to partitioning in the past. Over time, locking was reduced and I guess this will be the same with this feature.

L’article PostgreSQL 17: Split and Merge partitions est apparu en premier sur dbi Blog.

DevOps Best Practice – Backup and Share your work with GitHub

Yann Neuhaus - Tue, 2024-04-09 01:17

With my mate Chay Te (our DevOps champion in all categories and the mastermind of this best practice) we worked on scripts for our new Kubernetes security talk. These scripts where stored in our EC2 instance but this should not be their permanent location. First the EC2 instance could be deleted and we would lose everything. Then we need to version these files and keep track of the changes between us two. It was time to apply DevOps best practice for our scripts and we decided to use GitHub for this purpose. Read on to learn how to backup and share your work with GitHub in this step-by-step guide!


The first step is to sign up for a GitHub account if you don’t already have one.

Sign up for GitHub

Then you can create your first repository (also called repo for short) by giving it a name. You can select a Private repo if the files you share are private (it was in our case). So far so good, nothing complicated here!

Now you want to connect from your EC2 instance (in our case but it could be any type of machine) to this repo and push your scripts. Before you can do that, there is some configuration to do in GitHub. You have to create a Personal Access Token (PAT) to allow this connection. Click on your profile in the top right corner and select Settings. Then choose Developer Settings and you will reach the PAT menu. Here there are 2 choices between a fine-grained and a classic token. The first one is in Beta and allow you to choose which access you want to give to each element of your repo. You give it a name and the token will be generated for you. It has an expiration date and you have to keep it somewhere safe like a password as you will not be able to retrieve it later.

GitHub personal access token as part of DevOps best practice.

You can now use your GitHub account name and this token to synchronize your scripts or files between EC2 and this repo.

The last thing to configure in GitHub is to invite your collaborators to access your repo. Click on Add people and enter the email address of your collaborator. She/He will receive an invite to accept to join you in this repo.

GitHub add a collaborator

Creating a repo and collaborating in it is part of DevOps best practice!

Git commands in EC2

Your GitHub repo is now ready so let’s use it and backup your scripts in it. Another DevOps best practice is to use Git as the CLI tool in our machine.

On the EC2 instance, the easiest way to proceed is to clone your GitHub repo (we give it the name MyNewRepo) with Git as follows:

$ git clone https://github.com/mygithubaccount/MyNewRepo.git

You will be asked to authenticate with your GitHub account name (here mygithubaccount) and use the PAT you have created above as password. In your EC2 instance you now have a new folder called MyNewRepo. At this stage it is empty. Go into it and set the Git configuration:

$ cd MyNewRepo

$ git config --global user.email "benoit.entzmann@dbi-services.com"
$ git config --global user.name "Benoit Entzmann"
$ git branch -M main
$ git remote add origin https://github.com/mygithubaccount/MyNewRepo.git

You set the global email and username you will use with Git. By default there is one Git branch that is called Master. Rename it as main. Finally set up a connection between your local Git repository and your remote repository.

Next copy or move all of your script files into this folder as shown in the example below:

$ cp -Rp ~/MyScripts/* ./

Now all of your script files are in right folder and you just need to add them to the local Git repo and push them to your repo in GitHub:

$ git add .
$ git commit -m "My scripts"
$ git push -u origin main

And this is it! You can just check in GitHub that all of your script files are now in the repo called MyNewRepo.

Wrap up

In a few steps we have seen how to backup your script files by using a repository in GitHub. You have not only backup your files, you have also setup the GitHub environment to collaborate in this repo. This is a DevOps best practice!

Now in case of a failure or accidental deletion of your EC2 (yes Instance state -> Terminate instance can happen!), you will be able to clone again your repo from GitHub and quickly get back on track with your scripts!

L’article DevOps Best Practice – Backup and Share your work with GitHub est apparu en premier sur dbi Blog.

Faster Ansible

Yann Neuhaus - Mon, 2024-04-08 10:03

Even if Ansible is powerful and flexible, it can be considered “slow”. It will be anyway faster, and more consistent, than doing the same steps manually. Nevertheless, we will experiment to make it even faster. I found few of them on the Internet, but rarely with figures of what to expect.

In this blog post, I will cover one of them and run different scenarios. We will also dig inside some internal mechanism used by Ansible.

SSH Connections

Ansible is connection intensive as it opens, and closes, many ssh connections to the targeted hosts.

I found two possible ways to count the amount of connections from control to agents nodes:

  • Add -vvv option to the ansible-playbook command.
  • grep audit.log file:
tail -f /var/log/audit/audit | grep USER_LOGIN

First option is really too much verbose, but I used it with the first playbook below to confirm the second option give the same count.

Simple Playbook

To demonstrate that, let’s start with a very minimal playbook without fact gathering:

- name: Test playbook
  hosts: all
  gather_facts: false
    - name: "ping"

This playbook triggered 8 ssh connections to the target host. If I enable facts gathering, count goes to 14 connections. Again, this is quiet a lot knowing that playbook does not do much beside check target is alive.

To summarize:

gather_factsconnectionstiming (s)false81,277true141,991ping playbook results What are All These Connection For?

To determine what are these connections doing, we can analyze verbose (really verbose!!) output of Ansible playbook without fact gathering.

First Connection

First command of first connection is echo ~opc && sleep 0 which will return the home directory of ansible user.


Second command is already scary:

( umask 77 && mkdir -p "` echo /home/opc/.ansible/tmp `"&& mkdir "` echo /home/opc/.ansible/tmp/ansible-tmp-1712570792.3350322-23674-205520350912482 `" && echo ansible-tmp-1712570792.3350322-23674-205520350912482="` echo /home/opc/.ansible/tmp/ansible-tmp-1712570792.3350322-23674-205520350912482 `" ) && sleep 0
  1. It set the umask for the commands to follow.
  2. Create tmp directory to store any python script on target
  3. In that directory, create a directory to store the script for this specific task
  4. Makes this ssh command return the temporary variable with full path to the task script directory
  5. sleep 0

This one is mainly to ensure directory structure exists on the target.


I will not paste this one here as it is very long and we can easily guess what it does with log just before:

Attempting python interpreter discovery

Roughly, what it does, it tries many versions of python.


Next, it will run a python script with discovered python version to determine Operating System type and release.


Fifth connection is actually a sftp command to copy module content (AnsiballZ_ping.py). AnsiballZ is a framework to embed module into script itself. This allows to be run modules with a single Python copy.


This one is simply ensuring execution permission is set on temporary directory (ie. ansible-tmp-1712570792.3350322-23674-205520350912482) as well the python script (ie. AnsiballZ_ping.py).

Eighth and Last Connection

Lastly, the execution of the ping module itself:

/usr/bin/python3.9 /home/opc/.ansible/tmp/ansible-tmp-1712570792.3350322-23674-205520350912482/AnsiballZ_ping.py && sleep 0

To reduce the amount of connection, there is one possible option: Pipelining

To enable that, I simply need to add following line in ansible.cfg:

pipelining = true

Or set ANSIBLE_PIPELINING environment variable to true.

How does it improve our playbook execution time:

gather_factsconnectionstiming (s)false3 (-62%)0,473 (-63%)true4 (-71%)1,275 (-36%)ping playbook results with pipelining

As we can see there is a significant reduction on the amount of ssh connections as well as a reduction of the playbook duration.

In this configuration, only 3 connections are made:

  • python interpreter discovery (connection #3)
  • OS type discovery (connection #4)
  • python module execution (connection #8). AnsiballZ data is piped to that process.

With the pipelining option, I also noticed that the Ansible temporary directory is not created.

Of course, we can’t expect such big speed-up on a real life playbook. So, we should do it now.

Deploy WebLogic Server Playbook

Let’s use the WebLogic YaK component to deploy a single WebLogic instance. It includes dbi service best practices, latest CPU patches and SSL configuration. The “normal” run takes 13 minutes 30 seconds when the pipelined run takes 12 minutes 13 seconds. This is 10% faster.

This is nice, but not as good as previous playbook. Why is that? Because most of the time is not spent in ssh connections, but with actual work (running WebLogic installer, starting services, patching with OPatch, etc).

What Next?

With such results, you might wonder why isn’t it enabled by default? As per documentation, there is a limitation:

This can conflict with privilege escalation (become). For example, when using sudo operations you must first disable ‘requiretty’ in the sudoers file for the target hosts, which is why this feature is disabled by default.

Ansible documentation

Until now, with all tests I have made, I never encountered that limitation. Did you?

L’article Faster Ansible est apparu en premier sur dbi Blog.

Is fragmentation an issue ?

Tom Kyte - Mon, 2024-04-08 07:06
Hai all, I have 1000 number of tables. some of the tables got delete rows and updated the fragmentaion is created. How to determine which tables are fragmented ?
Categories: DBA Blogs

Does Migrating 4k Tablespace block size to 8k database cause performance impact ?

Tom Kyte - Mon, 2024-04-08 07:06
I am migrating 11g database cross endianness from on-prem to EXACS . On-prem database db_block_size is 4k and all the tablespaces are also of 4k block size . <u>Since, I cannot provision non-standard block size database in OCI</u> , I am worried about the performance impact caused by different block size. Please help me understand what database block size would be recommended for the below scenario. <code> ----------------------------------------------------------- Source : ON_PREM ----------------------------------------------------------- Platform / ID : AIX-Based Systems (64-bit) / 6 Version : Size (GB) : 17 TB db_block_size : 4k All Tablespaces BLK Size : 4k ----------------------------------------------------------- Target : OCI - EXACS ----------------------------------------------------------- Platform / ID : LINUX / 13 Version : Size (GB) : 17 TB db_block_size : 8K APP Tablespaces BLK Size : 4k SYSTEM/SYSAUX/TEMP/UNDO : 8K </code> Phase 1: Migrating from AIX 11g to EXACS 11g Phase 2: 19c upgrade and Multi tenant {<i>Due to business requirement we have to split migration and upgrade</i>} <b>Question : </b> 1. Can we guarantee that there will be no performance impact due to difference in tablespace and database block size if db_4k_cache_size parameter is set adequately to large value . 2. Or Better to go for same 4k block size as source on-premises database. Off course application regression testing and RAT will be included , but testing both cases is not feasible, hence reaching for expert advice .
Categories: DBA Blogs

Restore from Archivelog only

Tom Kyte - Mon, 2024-04-08 07:06
Hello Sir, I am able to get one scenario to work and that scenario was where I had a VM (server) running Oracle 19c with just 1 table 5 records and I did a backup of the whole VM (disk backup) and now I added a new table in my db with 3 records (ensured db is in Archivelog mode) and then I ran: rman target / backup database plus archivelog; Now I went ahead and added 2 more records and noted the system time lets say **2024-04-06 15:33:55 ** (so I can restore upto this time). So basically a new table with 5 records. Once all this done I ran below command: backup incremental level 1 database plus archivelog; Now I deleted my VM and restored the first Old copy of my VM backup (one that had 1 table n 5 records), post this VM restore. I followed the steps below and I was able to get Point in time recovery to work up to 2024-04-06 15:33:55 (here now I should have 2 tables each with 5 records each). The main step which I had missed earlier was RESTORING the control file since I was doing restore on a different (new VM) server: sql>> shutdown abort; rman<code>>> startup nomount; rman >>RESTORE CONTROLFILE FROM "/mnt/orabkup1/snapcf_ev.f"; rman>> startup mount; rman >>run { SET UNTIL TIME "TO_DATE('2024-04-06 15:33:55', 'YYYY-MM-DD HH24:MI:SS')"; RESTORE DATABASE; RECOVER DATABASE; sql `ALTER DATABASE OPEN RESETLOGS?; }</code> Everything good here and with this approach I was able to get the Point in time recovery to work. I was missing that restore of the control file. Now the scenario which I am still not able to work out and I am sure I am making a very basic mistake (may be I dont understand the archivelog and redolog properly). The scenario I want to make work is : I have VM backup (disk backup) upto a level of 1 Table and 5 records. Then I create 2nd table and add lets say 2 records to it and this time I only take ARCHIVELOG backup and then add 3 more records and then backup incremental archivelog all and I note the time (lets assume '2024-04-06 15:33:55) with following steps: <code> backup archivelog all; insert into xxx VALUES(3,'Line 1'); insert into xxx VALUES(4,'Line 1'); commit; backup incremental level 1 archivelog all</code>; Here I have not done backup database plus archivelog (assuming all those new inserts would be in redolog and may be in archivelog?). Now I delete this VM and restore from disk backups a new VM from backup1 (where only 1 table 5 records) exists and now I simply run following: <code>shutdown abort; startup nomount; RESTORE CONTROLFILE FROM "/mnt/orabkup1/snapcf_ev.f"; startup mount; run { SET UNTIL TIME "TO_DATE('2024-04-06 15:33:55', 'YYYY-MM-DD HH24:MI:SS')"; RESTORE ARCHIVELOG all ; RECOVER DATABASE; sql `ALTER DATABASE OPEN RESETLOGS?; } </code> But unfortunately it complains about ORA-01194: file 1 needs more recovery to be consistent ORA-01110: data file 1: 'D:\BASE\MYDB\DATA\SYSTEM01.DBF' Not sure why this, as I was thinking that Arc...
Categories: DBA Blogs

Returning data in EXECUTE IMMEDIATE with dynamic values in USING clause

Tom Kyte - Mon, 2024-04-08 07:06
Hi Team I have below scenario. Step#1) User clicks to particular App UI screen. Step#2) User selects multiple filters on UI - say filter1, filter2 which correspond to table columns. Step#3) For each filter selected by user, he needs to enter data - say Mark (for filter1), Will (for filter2) based on which search will be performed on the respective filters (aka table columns). Step#4) User inputs from above Steps#2, 3 are passed to PLSQL API which returns desired SQL result in paginated manner (pageSize: 50). User inputs from Step#2, 3 will be dynamic. I have tried to implement this using native dynamic SQL, but looks like I have hit an end road here. Able to use dynamic values in "USING" clause, but not able to return the data from SELECT statement with EXECUTE IMMEDIATE. Shared above LiveSQL link which has re-producible test case. If I comment line "BULK COLLECT INTO l_arId, l_arName, l_arType" in the procedure, the block executes successfully. But I need the result set from SELECT statement in procedure as output. Looking for some advise here. Thanks a bunch!
Categories: DBA Blogs

19c doesn't allow truncation of data that is longer in length of column's char(40) definition

Tom Kyte - Mon, 2024-04-08 07:06
We have an application that has been written to insert a variable that is char(50) into a column that is defined as char(40). In Oracle 11g (I know this is very old) it would merely truncate the last 10 characters without issue. However, Oracle 19c doesn't allow this and raises an exception (which I believe should've always been the case). Where can I find documentation of this restriction and when it was changed and is there away around this other than changing the program code? Oracle 11 truncated that extra 10 characters in the below statemt ADBBGNX_ADDRESS_LINE_1 := agentrecord.producerrec.businessAddressLine1; Oracle 19 throws an exception with a NULL error status.
Categories: DBA Blogs


Jonathan Lewis - Mon, 2024-04-08 04:02

Here’s a detail about dbms_output that is probably overlooked because (in most cases) it’s ignorable, except that it can lead to unexpected response times when you try using it to debug “busy” operations.

A question on the Oracle SQL and PL/SQL forum asked: “Why is a PL/SQL ‘for loop’ so slow on Oracle Autonomous DB?” and provided the following code to demonstrate the issue.

        for i in 1..36000 loop
                dbms_output.put_line ('i value: '|| i);
        end loop;

The OP reported the elapsed time for this block as 1 minute 40 seconds (compared to Java taking only 2 seconds), and showed a screen shot to prove the point.

A couple of people added calls to systimestamp to the block to show that the loop actually took less than a second and explained that the time reported was mostly the time spent in displaying the results, not in creating them. Correct, of course, but that still raises the question of why it takes so long to display such a small amount of data.

If you call dbms_output.put_line() in your code you only get to see the results if something subsequently calls dbms_output.get_lines() to extract the results from the buffer created by dbms_output. (In the case of SQL*Plus this call is built in and enabled by default, so that there’s a hidden call to BEGIN DBMS_OUTPUT.GET_LINES(:LINES, :NUMLINES); END; after every call you make to the database unless you’ve set serveroutput off.)

You’ll note the plural in get_lines(), and the parameter :numlines – how many lines do you think the call will get by default? In SQL*Plus it’s whatever is set by the last call to set arraysize, which defaults to 15.

I set up a little test, connected to a local instance of 19.11, to demonstrate the impact – executing the supplied code from SQL*Plus using an arraysize of 1 (which is actually honoured by dbms_output), 15 (default), and 1000, with the following times:

  • set arraysize 1: Time: 8.90 seconds (ca. 36,000 roundtrips)
  • set arraysize 15: Time: 1.43 seconds (ca. 2,400 roundtrips)
  • set arraysize 1000: Time: 0.41 seconds (ca. 36 roundtrips)

Historically there was a limit of 1MB on the dbms_output buffer but that limit was removed in a fairly recent version of Oracle so it is possible to generate huge amounts of data in a single PL/SQL block. This means that whatever else you do to optimise for network traffic time you may still see time spent in “PGA memory operation” as your shadow process tries to allocate memory chunks for the buffer.


Subscribe to Oracle FAQ aggregator