How do I troubleshoot task failures with an "Error" or "failed" status, in AWS DMS?

8 minute read

I want to troubleshoot my AWS Database Migration Service (AWS DMS) task or AWS DMS Serverless replication that runs indefinitely or shows an "Error" or "Failed" status.

Resolution

Identify the error

To review the task status, complete the following steps:

Open the AWS DMS console.
In the navigation pane, choose Database migration tasks, and then select your task.
Review the task status.

If your task is in the Running state but doesn't show progress, then the task is stuck. To determine whether AWS DMS is migrating your data, check the Table statistics tab.

If your task is in the Running with errors state, then your task couldn't migrate one or more of the tables. To resolve this issue, resolve your table errors.

If your task is in the Failed state, then use Amazon CloudWatch logs to review the task errors. Complete the following steps:

Open the AWS DMS console.
In the navigation pane, choose Database migration tasks.
Select your migration task.
Choose the Logs & events tab.
Review the task logs displayed in the Logs section. To view the full logs in CloudWatch, choose View CloudWatch Logs.
Note: If your logs aren't available because the retention period expired, then reload the tables in error state. Or, create a new task with the affected tables, and then generate new logs.
To find errors, enter ]E: in the search bar. To find warnings, enter ]W: in the search bar. To understand your task progression, use TASK_MANAGER, TABLE_THREAD, SOURCE_CAPTURE, SOURCE_UNLOAD, or TARGET_LOAD.
Note the specific error message and the component that reported the error.

(AWS DMS Serverless Replications only) Troubleshoot task stuck fetching metadata

If your AWS DMS Serverless task can't retrieve table information from the source database, then you receive one of the following error messages:

"{'replication_state':'fetching_metadata', 'message': 'Fetching metadata from your source endpoint to calculate workload capacity.'}"

"{'replication_state':'failed', 'message': 'Failed to fetch metadata.', 'failure_message': 'Internal failure.'}"

To resolve these errors, create a temporary replication instance with a small instance class to validate connectivity. For example, use dms.t3.small or use an existing replication instance.

To perform the connectivity check, create a temporary AWS DMS replication instance with a small instance class type, Or, use an existing AWS DMS replication instance. Make sure the replication instance uses the same virtual private cloud (VPC) configuration as your AWS DMS Serverless setup, including the same VPC, subnet group, security groups and other network configurations.

Then, use the temporary replication instance to test the endpoint connection. Delete the temporary instance after you complete the testing. For more information, see How do I troubleshoot AWS DMS endpoint connectivity failures?

Confirm that the database user that you configured in your AWS DMS endpoint has the required permissions to query and read the metadata and system catalog tables.

When you migrate a large number of tables, reduce the number of tables in your table mappings to prevent metadata fetch timeouts during the initial discovery phase.

Troubleshoot task stuck at 0% progress or tables that show "Loading" indefinitely

To resolve tasks that show no progress, review CloudWatch Logs for large binary object (LOB) messages. If your tables contain BLOB, CLOB, TEXT, or large JSON columns, then configure LOB handling mode in your task settings.

Before you configure LOB settings, query your source database to check the maximum size of LOB columns. Based on the maximum LOB size, configure one of the following LOB modes.

In limited LOB mode, set the MaxLobSize to match the value retrieved from your source database. If LOB data exceeds the configured size, then AWS DMS truncates the LOB data.

If you can't use limited LOB mode, then use inline LOB mode.

Review CloudWatch Logs for resource-related error messages similar to the following:

"Reading from source is paused. Total storage used by swap files exceeded the limit 1048576000 bytes"

"Last Error Replication task out of memory."

To resolve these errors, check your replication instance's CPU, memory, swap files, and IOPS use.

For more information about how to determine the size of your replication instance, see Selecting the best size for a replication instance.

In CloudWatch logs, check for the following unsupported data type error messages:

"[SOURCE_UNLOAD ]E: Column 'column_name' of table 'schema.table_name' uses an unsupported data type. [1020412]"

"[SOURCE_UNLOAD ]W: Data type not supported for column 'column_name'. The column will be skipped."

"[TARGET_LOAD ]E: Cannot create table 'schema.table_name': unsupported column type 'data_type' for column 'column_name'

To resolve the errors, identify the table and column from the error message. Use transformation rules to exclude the column or manually migrate unsupported columns. For supported data types, see Sources for data migration and Targets for data migration.

Troubleshoot "Access denied" errors

If you use incorrect AWS DMS user credentials or the user doesn't have the required permissions, then you receive one of the following "Access denied" error messages:

"[SOURCE_CAPTURE ]E: Error 1045 (Access denied for user 'username'@'10.x.x.x' (using password: YES)) connecting to MySQL server 'hostname' [1020414] (mysql_endpoint_capture.c:297)"

"[SOURCE_CAPTURE ]E: RetCode: SQL_ERROR SqlState: 42501 NativeError: 7 Message: ERROR: permission denied for table table_name"

"[SOURCE_CAPTURE ]E: ORA-01031: insufficient privileges [1020414] (oracle_endpoint_capture.c:XXX)"

To resolve the "Access denied" error, take one or more of the following actions:

Confirm that you correctly set the AWS DMS endpoint credentials.
Test the endpoint connection to validate connectivity. If you experience connection issues, then see How do I troubleshoot AWS DMS endpoint connectivity failures?
Grant the required permissions to the AWS DMS user based on your database engine.

Troubleshoot "Duplicate" key errors

If data already exists in the target table that conflicts with the incoming data, then you receive one of the following error messages:

"[TASK_MANAGER ]W: Table 'schema'.'table_name' was errored/suspended (subtask 0 thread 1). Failed (retcode -1) to execute statement; RetCode: SQL_ERROR SqlState: 23000 NativeError: 1062 Message: [MySQL][ODBC 8.0(w) Driver][mysqld-8.0.32]Duplicate entry '3759392' for key 'table_name.PRIMARY'"

"[TARGET_LOAD ]E: Violation of PRIMARY KEY constraint 'PK_table_name'. Cannot insert duplicate key in object 'schema.table_name'. The duplicate key value is (12345)"

To resolve the errors, modify your AWS DMS task to set the target table preparation mode to either Truncate or Drop tables on target. Use Truncate unless you must drop and recreate the table structure.

Or, you can skip duplicate errors and continue the task.

To prevent task failure, modify your task settings to log errors.

Example task settings:

{
  "ErrorBehavior": {
    "DataErrorPolicy": "LOG_ERROR",
    "ApplyErrorDeletePolicy": "IGNORE_RECORD",
    "ApplyErrorInsertPolicy": "LOG_ERROR",
    "ApplyErrorUpdatePolicy": "LOG_ERROR",
    "DataErrorEscalationPolicy": "SUSPEND_TABLE",
    "DataErrorEscalationCount": 100
  }
}

After AWS DMS migrates, review the logs to check data integrity.

Troubleshoot PostgreSQL logical replication errors

If you don't complete the PostgreSQL logical replication prerequisites or long-running transactions block slot creation, then you receive one of the following error messages:

"[SOURCE_CAPTURE ]E: RetCode: SQL_ERROR SqlState: XX000 NativeError: 1 Message: ERROR: pglogical is not in shared_preload_libraries; Error while executing the query [1022502] (ar_odbc_stmt.c:2738)"

"[SOURCE_CAPTURE ]E: Unable to create slot 'slot_name' (on execute(...) phase). Check if there is any long running transaction in source. [1020101] (postgres_pglogical.c:512)"

"[SOURCE_CAPTURE ]E: RetCode: SQL_ERROR SqlState: 42P01 NativeError: 1 Message: ERROR: relation "pglogical.replication_set" does not exist;"

To resolve these errors, confirm that you turned on logical replication. For Amazon RDS for PostgreSQL, set the rds.logical_replication parameter to 1 in the parameter group. Then, reboot the Amazon RDS instance.

To check for long-running transactions that might block replication slot creation, run the following SQL command:

sql
   SELECT pid, now() - xact_start AS duration, query, state
   FROM pg_stat_activity
   WHERE state != 'idle'
   ORDER BY duration DESC;

To stop long-running transactions, run the following SQL command:

sql
   SELECT pg_terminate_backend(pid);

To clean up orphaned replication slots that might prevent new slot creation, run the following SQL command:

sql
   SELECT slot_name, active FROM pg_replication_slots;
   SELECT pg_drop_replication_slot('slot_name');

Note: Replace slot_name with your replication slot's name.

Troubleshoot Oracle tablespace errors

If the target Oracle tablespace doesn't have enough space for the migrated data, then you receive one of the following error messages:

"[TARGET_LOAD ]W: Oracle error code is '1653' ORA-01653: unable to extend table SCHEMA.TABLE_NAME by 8192 in tablespace TABLESPACE_NAME (oracle_endpoint_load.c:1629)"

"[TARGET_LOAD ]E: ORA-01653: unable to extend table SCHEMA.TABLE_NAME by 8192 in tablespace TABLESPACE_NAME [1020436] (oracle_endpoint_load.c:1629)"

To add more space to the tablespace, run the following command:

ALTER TABLESPACE tablespace_name ADD DATAFILE '/path/to/datafile.dbf' SIZE 10G AUTOEXTEND ON;

Note: Replace tablespace_name with your tablespace name, /path/to/datafile.dbf with the path to your datafile, and 10G with the datafile size.

To resize the existing data file, run the following command:

ALTER DATABASE DATAFILE '/path/to/datafile.dbf' RESIZE 20G;

Note: Replace /path/to/datafile.dbf with the path to your datafile, and 20G with the datafile size.

Then, reload the failed table.

Topics: Migration & Modernization
Tags: AWS Database Migration Service
Language: English

AWS OFFICIALUpdated 5 days ago

No comments

Relevant content

How to troubleshoot useless error logs from DMS?
Anentropic
asked 2 years ago
AWS DMS Replication task - Table in error - Setup notification
fran
asked 2 years ago
DMS task is hanging and not showing any progress or errors in the Cloudwatch logs.
Bhavesh
asked 2 years ago
Failed DMS Task with "Failed downloading rds backup as the download failed"
Metin
asked a year ago
Notify table error on AWS DMS Replication task
Accepted Answer
Matias
asked 6 months ago
How do I troubleshoot AWS DMS replication task failures and stuck states?
AWS OFFICIALUpdated 16 days ago
How do I troubleshoot and resolve table errors in AWS DMS?
AWS OFFICIALUpdated 6 months ago
How do I troubleshoot connectivity failure and errors for an AWS DMS task that uses Amazon Redshift as the target endpoint?
AWS OFFICIALUpdated 7 months ago
How do I troubleshoot endpoint issues that cause my AWS DMS migration tasks to fail?
AWS OFFICIALUpdated 2 months ago
Perform a Full Load from Db2 z/OS Tables with BLOB Fields Using AWS DMS
EXPERT
Ashish Prasad
published 6 months ago