[AuroraDB]: Simulating failures using SQL

Testing Amazon Aurora Using Fault Injection Queries

You can test the fault tolerance of your Amazon Aurora DB cluster by using fault injection queries. Fault injection queries are issued as SQL commands to an Amazon Aurora instance and they enable you to schedule a simulated occurrence of one of the following events:

  • A crash of the master instance or an Aurora Replica
  • A failure of an Aurora Replica
  • A disk failure
  • Disk congestion

Fault injection queries that specify a crash force a crash of the Amazon Aurora instance. The other fault injection queries result in simulations of failure events, but don’t cause the event to occur. When you submit a fault injection query, you also specify an amount of time for the failure event simulation to occur for.

You can submit a fault injection query to one of your Aurora Replica instances by connecting to the endpoint for the Aurora Replica

Testing an Instance Crash :

=========================

You can force a crash of an Amazon Aurora instance using the ALTER SYSTEM CRASH fault injection query.

Syntax

ALTER SYSTEM CRASH [ INSTANCE | DISPATCHER | NODE ];

Options

This fault injection query takes one of the following crash types:

INSTANCE—A crash of the MySQL-compatible database for the Amazon Aurora instance is simulated.

DISPATCHER—A crash of the dispatcher on the master instance for the Aurora DB cluster. The dispatcher writes updates to the cluster volume for an Amazon Aurora DB cluster is simulated.

NODE—A crash of both the MySQL-compatible database and the dispatcher for the Amazon Aurora instance is simulated. For this fault injection simulation, the cache is also deleted.

The default crash type is INSTANCE.

ALTER SYSTEM CRASH [{INSTANCE | DISPATCHER | NODE}];
[root@dbversity.com sysbench]# mysql -h dbversity.com -P 3306 -u MyUserName -p”MyPassw0rd” -D sbtest
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Welcome to the MySQL monitor. Commands end with ; or \g.
Your MySQL connection id is 3607
Server version: 5.6.10 MySQL Community Server (GPL)

Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type ‘help;’ or ‘\h’ for help. Type ‘\c’ to clear the current input statement.

mysql>
mysql> ALTER SYSTEM CRASH INSTANCE;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>
mysql>
mysql> show databases;
No connection. Trying to reconnect…
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘dbversity.com’ (111)
ERROR:
Can’t connect to the server

mysql>
mysql> show databases;
No connection. Trying to reconnect…
Connection id: 4
Current database: sbtest

+——————–+
| Database |
+——————–+
| information_schema |
| aurora |
| dbversit |
| MyUserName |
| mysql |
| performance_schema |
| sbtest |
+——————–+
7 rows in set (0.01 sec)

mysql>
mysql> ALTER SYSTEM CRASH DISPATCHER;
Query OK, 0 rows affected (0.04 sec)

mysql> show databases;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect…
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘dbversity.com’ (111)
ERROR:
Can’t connect to the server

mysql>
mysql> show databases;
No connection. Trying to reconnect…
Connection id: 1
Current database: sbtest

+——————–+
| Database |
+——————–+
| information_schema |
| aurora |
| dbversit |
| MyUserName |
| mysql |
| performance_schema |
| sbtest |
+——————–+
7 rows in set (0.00 sec)

mysql>
mysql> ALTER SYSTEM CRASH NODE;
ERROR 2013 (HY000): Lost connection to MySQL server during query
mysql>
mysql> show databases;
No connection. Trying to reconnect…
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘dbversity.com’ (111)
ERROR:
Can’t connect to the server

mysql> show databases;
No connection. Trying to reconnect…
ERROR 2003 (HY000): Can’t connect to MySQL server on ‘dbversity.com’ (111)
ERROR:
Can’t connect to the server

mysql> show databases;
No connection. Trying to reconnect…
Connection id: 10
Current database: sbtest

+——————–+
| Database |
+——————–+
| information_schema |
| aurora |
| dbversit |
| MyUserName |
| mysql |
| performance_schema |
| sbtest |
+——————–+
7 rows in set (0.01 sec)

mysql>

Testing a Disk Failure :-
======================

You can simulate a disk failure for an Aurora DB cluster using the ALTER SYSTEM SIMULATE DISK FAILURE fault injection query.

During a disk failure simulation, the Aurora DB cluster randomly marks disk segments as faulting.
Requests to those segments will be blocked for the duration of the simulation.

To simulate the failure of disks:

ALTER SYSTEM SIMULATE percentage_of_failure PERCENT DISK FAILURE [ IN DISK index | NODE index ]
FOR INTERVAL quantity [ YEAR | QUARTER | MONTH | WEEK| DAY | HOUR | MINUTE | SECOND ];
mysql> ALTER SYSTEM SIMULATE 50 PERCENT DISK FAILURE FOR INTERVAL 3 MINUTE;
Query OK, 0 rows affected (0.01 sec)

mysql> select now();
+———————+
| now() |
+———————+
| 2015-04-22 23:39:50 |
+———————+
1 row in set (0.00 sec)
Testing an Aurora Replica Failure :
=================================

You can simulate the failure of an Aurora Replica using the ALTER SYSTEM SIMULATE READ REPLICA FAILURE fault injection query.

An Aurora Replica failure will block all requests to an Aurora Replica or all Aurora Replicas in the DB cluster for a specified time interval. When the time interval completes, the affected Aurora Replicas will be automatically synced up with master instance.

Syntax

ALTER SYSTEM SIMULATE percentage_of_failure PERCENT
READ REPLICA FAILURE [ TO ALL | TO “replica name” ]
FOR INTERVAL quantity [ YEAR | QUARTER | MONTH | WEEK| DAY | HOUR | MINUTE | SECOND ];
Failure type—
The type of failure to simulate. Specify TO ALL to simulate failures for all Aurora Replicas in the DB cluster. Specify TO and the name of the Aurora Replica to simulate a failure of a single Aurora Replica. The default failure type is TO ALL.

ALTER SYSTEM SIMULATE 50 PERCENT

READ REPLICA FAILURE TO ALL
FOR INTERVAL 10 SECOND ;
mysql> ALTER SYSTEM SIMULATE 50 PERCENT READ REPLICA FAILURE TO ‘my-replica1’ FOR INTERVAL 1 SECOND;
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql>
mysql> select Server_id,Replica_lag_in_msec from mysql.ro_replica_status;
+———————+———————+
| Server_id | Replica_lag_in_msec |
+———————+———————+
| my-auroradb-cluster | 0 |
| my-replica1 | 20.009 |
| my-replica2 | 19.312 |
+———————+———————+
3 rows in set (0.00 sec)

mysql> ALTER SYSTEM SIMULATE 50 PERCENT
-> READ REPLICA FAILURE TO ALL
-> FOR INTERVAL 10 SECOND ;
Query OK, 0 rows affected (0.00 sec)

mysql>
mysql>
mysql> select Server_id,Replica_lag_in_msec from mysql.ro_replica_status;
+———————+———————+
| Server_id | Replica_lag_in_msec |
+———————+———————+
| my-auroradb-cluster | 0 |
| my-replica1 | 39.96 |
| my-replica2 | 22.393 |
+———————+———————+
3 rows in set (0.00 sec)

 
Testing Disk Congestion :
=======================

You can simulate a disk failure for an Aurora DB cluster using the ALTER SYSTEM SIMULATE DISK CONGESTION fault injection query.

During a disk congestion simulation, the Aurora DB cluster randomly marks disk segments as congested. Requests to those segments will be delayed between the specified minimum and maximum delay time for the duration of the simulation.

Syntax

ALTER SYSTEM SIMULATE percentage_of_failure PERCENT

DISK CONGESTION BETWEEN minimum AND maximum MILLISECONDS
[ IN DISK index | NODE index ]
FOR INTERVAL quantity [ YEAR | QUARTER | MONTH | WEEK| DAY | HOUR | MINUTE | SECOND ];

  • Ask Question