As a part of DBA chores we often encounter alter table requests (DDL) in MySQL production systems and the usual procedure will be creating new servers with the desired alter and tossing down the old ones. Its quicker and optimal than doing it on individual master/slaves in our environment when the table size is huge.
In this particular case, the table size is ~2TB and we need to change the datatype of a column from int unsigned to bigint unsigned. So we followed the below usual steps:
- Created a new slave and ran the DDL
- Sync replication with slave_type_conversions = ALL_NON_LOSSY
Since its row-based replication with full image we skipped data checksums and deployed few slaves into production. Voila!! The dev team started to see crazy data on their application. The new rows on the altered column appeared to have zero values. We looked into the table and mazed with the behaviour for sometime until we figured out we hit the below bug. https://bugs.mysql.com/bug.php?id=82599 (which was reported for both 5.6 and 5.7 and still in open). Lets walk through this with a simple example
To stimulate this scenario lets have a simple master-slave replication with RBR(Row-Based-Replication)
On Master:
CREATE TABLE `integer_test` (
`a` int(10) unsigned NOT NULL,
`b` int(10) unsigned NOT NULL,
`c` bigint(20) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1
mysql> insert into integer_test values(1,1,1);
Query OK, 1 row affected (0.00 sec)
On Slave:
mysql> alter table integer_test modify `a` bigint unsigned NOT NULL,modify `b` bigint(20) NOT NULL;
Query OK, 0 rows affected (0.00 sec)
Try inserting again on the master:
mysql> insert into integer_test values(2147483647,2147483647,2147483647);
Query OK, 1 row affected (0.00 sec)
Now the replication will break on the slave with below error since we are in RBR:
Last_Error: Column 1 of table 'monitor_me.integer_test' cannot be converted from type 'int' to type 'bigint(20)'
mysql> set global slave_type_conversions="ALL_NON_LOSSY";
Query OK, 0 rows affected (0.00 sec)
mysql> stop slave;start slave;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)
Lets insert few more entries on master:
mysql> insert into integer_test values(2147483648,2147483648,2147483648);
Query OK, 1 row affected (0.00 sec)
mysql> insert into integer_test values(4294967295,4294967295,4294967295);
Query OK, 1 row affected (0.01 sec)
Lets verify the data:
![](https://playwithmysql.wordpress.com/wp-content/uploads/2019/04/screen-shot-2019-04-09-at-8.54.36-pm.png)
![](https://playwithmysql.wordpress.com/wp-content/uploads/2019/04/screen-shot-2019-04-09-at-8.58.57-pm.png)
![](https://playwithmysql.wordpress.com/wp-content/uploads/2019/04/screen-shot-2019-04-08-at-9.19.23-pm.png)
Why it happen: – Lets have a look into the Binary logs:
Query executed:
insert into integer_test values(2147483647,2147483647,2147483647);
On Binlogs:
### INSERT INTO `promotion`.`integer_test`
### SET
### @1=2147483647 /* INT meta=0 nullable=0 is_null=0 */
### @2=2147483647 /* INT meta=0 nullable=0 is_null=0 */
### @3=2147483647 /* LONGINT meta=0 nullable=0 is_null=0 */
insert into integer_test values(2147483648,2147483648,2147483648);
### INSERT INTO `promotion`.`integer_test`
### SET
### @1=-2147483648 (2147483648) /* INT meta=0 nullable=0 is_null=0 */
### @2=-2147483648 (2147483648) /* INT meta=0 nullable=0 is_null=0 */
### @3=2147483648 /* LONGINT meta=0 nullable=0 is_null=0 */
As we can see, when the integer column crosses the upper bound of positive int (2147483647), mysql starts to write both the signed value and actual value into the binlog. And the MySQL doc explains this handling of integers as below.
When an integer type is promoted, its signedness is not preserved. By default, the slave treats all such values as signed. Beginning with MySQL 5.6.13, you can control this behavior using ALL_SIGNED, ALL_UNSIGNED, or both. (Bug#15831300) ALL_SIGNED tells the slave to treat all promoted integer types as signed; ALL_UNSIGNED instructs it to treat these as unsigned. Specifying both causes the slave to treat the value as signed if possible, otherwise to treat it as unsigned; the order in which they are listed is not significant. Neither ALL_SIGNED nor ALL_UNSIGNED has any effect if at least one of ALL_LOSSY or ALL_NONLOSSY is not also used.
In our above example @1=-2147483648 (2147483648) is converted into nearest unsigned value on the slave as 0 ( the column definition on the slave is`a` bigint(20) unsigned NOT NULL).
Also we have tried the other slave_type_conversions options but not able to replicate the data properly this made us to switch to SBR for few days. Our concern is either there should be a way to achieve this on RBR or there could be a caution on the MySQL documentation on this behaviour .
How to avoid this trap:
- When you are using RBR and in need to promote an interger column from signed to unsigned, go for the bigint unsigned directly.
- In-case you are already with integer unsigned and the values crossed 2147483647, change your binlog formatting on master to SRB till the time to alter on all the slaves and promote the new master.