Oracle - Row duplicates in a table with a PK loaded by sql*loader

I had an interesting problem today.  One of the developers told me a nightly job had stopped with an error.  I check the usual things like has anything changed and was told no.  The error was sent to me and was the following:
SQL*Loader-926: OCI error while executing delete/truncate (due to REPLACE/TRUNCATE keyword) for table REPORT_TEMP
ORA-01502: index 'PK_REPORT_TEMP' or partition of such index is in unusable state

I looked at the index and found it to be unusable:
SQL> select index_name, uniqueness, status from dba_indexes where table_name = 'REPORT_TEMP';

INDEX_NAME                     UNIQUENESS                           STATUS
------------------------------ ------------------------------------ --------------------------------
PK_REPORT_TEMP  UNIQUE                               UNUSABLE
1 row selected.

My first thoughts were to rebuild it:
SQL> alter index mercury.PK_REPORT_TEMP rebuild;
alter index mercury.PK_REPORT_TEMP rebuild
*
ERROR at line 1:
ORA-01452: cannot CREATE UNIQUE INDEX; duplicate keys found



I check the table and found it had two duplicate rows.  Next I checked the constraints on the table:
SQL>  select CONSTRAINT_NAME, CONSTRAINT_TYPE, R_OWNER, R_CONSTRAINT_NAME VALIDATED, INDEX_NAME, INVALID from dba_constraints where table_name = 'REPORT_TEMP'
  2  /

CONSTRAINT_NAME                CONS
------------------------------ ----
R_OWNER
------------------------------------------------------------------------------------------------------------------------
VALIDATED                      INDEX_NAME                     INVALID
------------------------------ ------------------------------ ----------------------------
PK_REPORT_TEMP  P

                               PK_REPORT_TEMP


After checking the control file for sql*loader, I found that load was done via a DIRECT method.  The job had worked the same way for quite a while by replacing the current rows with the new load (REPLACE) and then performing some extra processing.

As a conventional load converts the data into a bunch of INSERT statements any issues are written to the BAD file.  The problem seemed to be using the DIRECT method to load and it's handling of constraints.  A  direct load mode by-passes the formatting of the data into INSERT statements and writes the data directly into the table.

"During a direct path load, new index keys for the new rows are written to a temporary segment.  At the end of the load, each temporary segment is sorted, and then merged with the corresponding old index in order to create a new index. After the new indexes are created, the old indexes and the temporary segments are removed. Merging all the new index entries at once like this usually requires less processing overhead than would be required to insert each new index entry one at a time." - http://www.ypl.com/oracle/sql_loader/gennick_jonathan/html_deep/ch10s03s06s01.html

"The PK and UNIQUE constraints remain enabled during a direct path load, but new rows are only verified against these constraints at the end of the load process when the underlying indexes are rebuilt. If a row violates a UNIQUE KEY or PRIMARY KEY constraint, then it is not rejected; it still gets loaded. However, the corresponding index will remain in the UNUSABLE state after the load completes. You need to locate the duplicate row(s), resolve the problem, and then rebuild the index manually." - http://www.ypl.com/oracle/sql_loader/gennick_jonathan/html_deep/ch10s03s07.html

I had to remove the duplicate rows and I couldn't just run a DELETE FROM command as the index was unusable.  I did try disabling the PK index and running the DELETE FROM again but it failed.  I was going to drop the PK index but as this was a production issue and I ran out of time I did a TRUNCATE TABLE (the data was temporary and we could rerun the load).  The TRUNCATE worked.  I changed the sql*loader method from DIRECT to CONVENTIONAL and the duplicates in the source data went in to the bad file and the load worked.

I guess the interesting thing is the effected DIRECT loading has.  To stop this happening again we had two options
1.  Change the sql*loader method permanently the draw back to doing this is speed.  DIRECT is very fast as a conventional path load executes SQL INSERT statements to populate tables in an Oracle database.
2.  Change source data to not have duplicates (sort -u in Linux)

No comments:

Post a Comment