The unsigned TRAP

As a part of DBA chores we often encounter alter table requests (DDL) in MySQL production systems and the usual procedure will be creating new servers with the desired alter and tossing down the old ones. Its quicker and optimal than doing it on individual master/slaves in our environment when the table size is huge.

In this particular case, the table size is ~2TB and we need to change the datatype of a column from int unsigned to bigint unsigned. So we followed the below usual steps:

  1. Created a new slave and ran the DDL
  2. Sync replication with slave_type_conversions = ALL_NON_LOSSY

Since its row-based replication with full image we skipped data checksums and deployed few slaves into production. Voila!! The dev team started to see crazy data on their application. The new rows on the altered column appeared to have zero values. We looked into the table and mazed with the behaviour for sometime until we figured out we hit the below bug. https://bugs.mysql.com/bug.php?id=82599 (which was reported for both 5.6 and 5.7 and still in open). Lets walk through this with a simple example

To stimulate this scenario lets have a simple master-slave replication with RBR(Row-Based-Replication)

On Master:
CREATE TABLE `integer_test` (
  `a` int(10) unsigned NOT NULL,
  `b` int(10) unsigned NOT NULL,
  `c` bigint(20) unsigned NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=latin1

mysql> insert into integer_test values(1,1,1);
Query OK, 1 row affected (0.00 sec)

On Slave:
mysql> alter table integer_test modify `a` bigint unsigned NOT NULL,modify `b` bigint(20) NOT NULL;
Query OK, 0 rows affected (0.00 sec)

Try inserting again on the master:
mysql> insert into integer_test values(2147483647,2147483647,2147483647);
Query OK, 1 row affected (0.00 sec)

Now the replication will break on the slave with below error since we are in RBR:
                   Last_Error: Column 1 of table 'monitor_me.integer_test' cannot be converted from type 'int' to type 'bigint(20)'

mysql> set global slave_type_conversions="ALL_NON_LOSSY";
Query OK, 0 rows affected (0.00 sec)

mysql> stop slave;start slave;
Query OK, 0 rows affected (0.00 sec)
Query OK, 0 rows affected (0.00 sec)

Lets insert few more entries on master:
mysql> insert into integer_test values(2147483648,2147483648,2147483648);
Query OK, 1 row affected (0.00 sec)

mysql> insert into integer_test values(4294967295,4294967295,4294967295);
Query OK, 1 row affected (0.01 sec)

Lets verify the data:
schema difference between master(highlighted) and slave
data difference between master(highlighted) and slave
“when the integer promotion fails”

Why it happen: – Lets have a look into the Binary logs:

Query executed: 
insert into integer_test values(2147483647,2147483647,2147483647);

On Binlogs:
### INSERT INTO `promotion`.`integer_test`
### SET
###   @1=2147483647 /* INT meta=0 nullable=0 is_null=0 */
###   @2=2147483647 /* INT meta=0 nullable=0 is_null=0 */
###   @3=2147483647 /* LONGINT meta=0 nullable=0 is_null=0 */

insert into integer_test values(2147483648,2147483648,2147483648);
### INSERT INTO `promotion`.`integer_test`
### SET
###   @1=-2147483648 (2147483648) /* INT meta=0 nullable=0 is_null=0 */
###   @2=-2147483648 (2147483648) /* INT meta=0 nullable=0 is_null=0 */
###   @3=2147483648 /* LONGINT meta=0 nullable=0 is_null=0 */

As we can see, when the integer column crosses the upper bound of positive int (2147483647), mysql starts to write both the signed value and actual value into the binlog. And the MySQL doc explains this handling of integers as below.

When an integer type is promoted, its signedness is not preserved. By default, the slave treats all such values as signed. Beginning with MySQL 5.6.13, you can control this behavior using ALL_SIGNED, ALL_UNSIGNED, or both. (Bug#15831300) ALL_SIGNED tells the slave to treat all promoted integer types as signed; ALL_UNSIGNED instructs it to treat these as unsigned. Specifying both causes the slave to treat the value as signed if possible, otherwise to treat it as unsigned; the order in which they are listed is not significant. Neither ALL_SIGNED nor ALL_UNSIGNED has any effect if at least one of ALL_LOSSY or ALL_NONLOSSY is not also used.

In our above example @1=-2147483648 (2147483648) is converted into nearest unsigned value on the slave as 0 ( the column definition on the slave is`a` bigint(20) unsigned NOT NULL).

Also we have tried the other slave_type_conversions options but not able to replicate the data properly this made us to switch to SBR for few days. Our concern is either there should be a way to achieve this on RBR or there could be a caution on the MySQL documentation on this behaviour .

How to avoid this trap:

  • When you are using RBR and in need to promote an interger column from signed to unsigned, go for the bigint unsigned directly.
  • In-case you are already with integer unsigned and the values crossed 2147483647, change your binlog formatting on master to SRB till the time to alter on all the slaves and promote the new master.

References:

How to add PostgreSQL Replica Lag metrics to Percona Management and Monitoring 2 (PMM2)

By default, if you add PostgreSQL to the PMM monitoring, replica lag metrics will not be collected by the postgres_exporter.

As a Database Engineer, it will be a disaster, running postgres replicas without having replication health metrics. So we thought to added pg_replication_lag metrics using custom queries.

How add custom queries to postgres_exporter:

  1. Check whether postgres_exporter is running:
    This will show the custom query directories of postgres_exporter
ps -aux | grep postgres_exporter

/usr/local/percona/pmm2/exporters/postgres_exporter --collect.custom_query.hr --collect.custom_query.hr.directory=/usr/local/percona/pmm2/collectors/custom-queries/postgresql/high-resolution --collect.custom_query.lr --collect.custom_query.lr.directory=/usr/local/percona/pmm2/collectors/custom-queries/postgresql/low-resolution --collect.custom_query.mr --collect.custom_query.mr.directory=/usr/local/percona/pmm2/collectors/custom-queries/postgresql/medium-resolution --web.listen-address=:42000

It contains multiple directories like below:

/usr/local/percona/pmm2/collectors/custom-queries/postgresql/
├── high-resolution
│   ├── example-queries-postgres.yml
│   ├── queries-postgres-replica_lag.yml
│   └── queries-postgres-uptime.yml
├── low-resolution
│   └── example-queries-postgres.yml
└── medium-resolution
    └── example-queries-postgres.yml

If you want to frequently monitor the custom metrics, add to high-resolution directory

Metric Resolution:

  1. high-resolution (scrape interval : 5s)
  2. low-resolution (scrape interval : 60s)
  3. medium-resolution (scrape interval: 30s)

Note: This is standard scrape interval in PMM2. It can be also customised.

2. Adding custom metrics for pg_replication_lag

Here I am choosing high-resolution directory

cd /usr/local/percona/pmm2/collectors/custom-queries/postgresql/high-resolution

Create a yml file for adding custom metrics (queries-postgres-replica_lag.yml)

vi queries-postgres-replica_lag.yml
#################################
#Replica Lag Metrics
#################################
pg_replication:
  query: "SELECT CASE WHEN pg_last_wal_receive_lsn() = pg_last_wal_replay_lsn() THEN 0 ELSE EXTRACT (epoch from (now() - pg_last_xact_replay_timestamp())) END AS lag;"
  metrics:
    - lag:
        usage: "GAUGE"
        description: "Replication lag behind master in seconds"

Like this we can add more custom alerts for postgres. It works for mysql_exporter as well.

3. Restart pmm-agent to apply the changes.

 systemctl status pmm-agent

4. Verify the changes using grafana.

How to reset PMM2 admin Password

If you have forgotten the PMM admin, there is a easy way to reset the admin password from the command line.

On AWS Market Place PMM2 Version:

  1. Login PMM instance
  2. Run the below command to reset the admin password
[ajithdaniel@playwithmysql grafana]# grafana-cli --homepath /usr/share/grafana --configOverrides cfg:default.paths.data=/srv/grafana admin reset-admin-password newpass

Here “newpass” is desired new password

INFO[01-05|06:41:27] Connecting to DB                         logger=sqlstore dbtype=sqlite3
INFO[01-05|06:41:27] Starting DB migration                    logger=migrator

Admin password changed successfully ✔

On Docker PMM2 version:

  1. Login PMM Docker instance
  2. Run the below command to reset the admin password
[ajithdaniel@playwithmysql] docker exec -t pmm-server bash -c  'ln -s /srv/grafana /usr/share/grafana/data; grafana-cli --homepath /usr/share/grafana admin reset-admin-password newpass'

Here “newpass” is desired new password

INFO[01-05|06:45:23] Connecting to DB                         logger=sqlstore dbtype=sqlite3
INFO[01-05|06:45:23] Starting DB migration                    logger=migrator

Admin password changed successfully ✔

MySQL Partition Management

As we know, MySQL stores the innodb data in the ibd format per table on the disk. While partitioning, the table will be physically split into multiple files( ibd s) but considered as a single table logically.

Each query fired on the table T with partitions p1,p2…pN is handled as p1 UNION p2..UNION pN ; Hence its good to have optimal number of tables on the disk.

Broadly partitions are implemented to boost performance where the horizontal sharding will help or to keep the data size intact by archiving the cold partitions.

Building Blocks

If you are familiar with MySQL partitions, you can skip to the #tooling part.

Prerequisite:

All the unique keys (including primary keys) should be appended with the partition column.

PRIMARY KEY (`id`) - > PRIMARY KEY (`id`,`created_at`) 
UNIQUE KEY `uniq_bk` (`uniq_id`,`created_at`)

Optimizations:

  1. Partition Pruning Query re-write with the partition column and retention interval to avoid scans on future empty partitions
 where created_at > (NOW() - INTERVAL 60 DAY) and created_at <= NOW() 
  1. Optimizer Hinting to explicitly choose the mentioned index where more than one secondary key scan in the where clause.
force index(<idx_column1>) where column1='' and column2=''

Limitations :

  1. Partitions are non supported with tables having Foreign-keys
  2. The optimizer can pick up a sub-optimal execution plan where
    • more than one secondary key involved in the `where` clause or
    • a range query on the time column other than the partition key.

Archival Strategies using Partitions

With partitions, we can setup an archival policy where the data is archived to either cold backups or moved to analytical solution and expired partitions will be dropped/truncated.

RANGE

As the name suggests the partition expression is matched against the range of sorted integer values to determine the partition.

For data archiving its apt to go for a column which holds the time of row created and retention policy is checked against it.

Note: Data update policy from application should be lesser than the retention policy to avoid records not found errors.

created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP (values initialized by MySQL) or created_at datetime NOT NULL (values inserted by app)

created_at timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP (values initialized by MySQL) or 
created_at datetime NOT NULL (values inserted by app)

LIST

by using soft-deleted column

The partition is determined by checking the value against defined list. Comes in handy when application has many conditions to flag the records for archival.

 PARTITION BY LIST (`is_deleted`)
 (PARTITION p_active VALUES IN (0) ENGINE = InnoDB,
 PARTITION p_deleted VALUES IN (1) ENGINE = InnoDB)

MISC

It is also possible to combine the above two strategies. Other Partition type such as HASH/KEY can also be implemented on suitable scenarios.

Partition Management

There are very few partition management solutions are available and it is good to have a home brew tool to have better control.

Scope:

  1. generate partitions ie, non-partitioned to a partitioned table
  2. manage partitions ie, adding new/removing old partitions
  3. archival/backup strategy of the expired partitions

Let’s do some code to generate partitions for time range using Python. I’m going to use panda and yamale modules which will do the heavy lifting for us.

https://pypi.org/project/yamale/
https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html

YAML for the configuration management so that the tools can be triggered via gitops/chatops stack.

partition_config.yaml

---
- database: classic
  table: furelise
  column: created_at
  type: timestamp
  start_time: 2020-07-01
  count: 7
  interval: 'MS'

- database: modern
  table: yaani
  column: created_on
  type: datetime
  start_time: 2020-07-01 12:00:00
  count: 7
  interval: 'D'
...
Here,
column - partition candidate column
type - column type of the partition candidate (in this example timestamp or datetime)
count - no.of partitions to be generated from the start time
interval -  offset frequency supported (refer link for all supported offsets). 
Lets create a validation yaml for the above config.
list(include('table'),)
---
table:
  database: str()
  table: str()
  column: str()
  type: enum('timestamp','datetime')
  start_time: any(day(), timestamp())
  count: int(min=1,max=30)
  interval: enum('W', 'MS', 'M', 'SMS', 'QS', 'D', 'Y')
...
Python module to generate the partition statement along with the failsafe pmin,pmax partitions.
pmin - to hold the 00:00 timestamp entries
pmax - to allow writes if partition expression exceeds the upper range 
partition_generate.py
import pandas as pd
from datetime import datetime

def generate_partitions(database,table,column,type,start_time,count,interval):
	conversion_function = 'TO_SECONDS' if type == 'datetime' else 'UNIX_TIMESTAMP'
	dti = pd.date_range(start_time, periods=count, freq=interval)
	print(f"ALTER TABLE {database}.{table} PARTITION BY RANGE ({conversion_function}({column}))\n( PARTITION p_min VALUES LESS THAN (1),")
	for d in dti:
		print(f"PARTITION upto_{d.date().strftime('%Y_%m_%d')} VALUES LESS THAN ({conversion_function}('{d}')),")
	print(f"PARTITION pmax VALUES LESS THAN MAXVALUE )")

Time to run:
make_partition.py
import yamale
import partition_generate as pg

banner='-----'
data = yamale.make_data('partition_config.yaml')
schema = yamale.make_schema('schema.yaml')

try:
    yamale.validate(schema, data)
    print('Validation success! 👍')
    print(banner)
    data,_ = data[0]
    for d in data:
        pg.generate_partitions(**d)
        print(banner)
except Exception as e:
    print('Validation failed!\n%s' % str(e))
    exit(1)

wow!

We can see the partitions are generated as expected and now we can pass the generated sql to a pipeline to execute it against the databases (Remember to add the partition column to the PRI/UNI keys)

verify from information schema

Looking forward for interesting questions/ideas on the MySQL partitions and tools.

Further Reads:

https://blog.jcole.us/2013/01/03/the-basics-of-innodb-space-file-layout/

https://dev.mysql.com/doc/mysql-partitioning-excerpt/8.0/en/partitioning.html

https://github.com/yahoo/mysql_partition_manager

https://architect.readthedocs.io/features/partition/index.html

Integrating Prometheus Alert Manager with Percona Monitoring and Management (PMM)

What is Prometheus Alert Manager?

Alert Manager is an open source tool used to route the alerts generated by Prometheus to your receiver integration like Slack, PagerDuty, VictorOps, Email, WeChat, etc …

Alert Manager supports

  • Grouping of Alerts
  • Silencing Alerts
  • Inhibition
  • Deduplicating of Alerts
  • High Availability

What is Percona Monitoring and Management (PMM)?

Percona Monitoring and Management (PMM) is an open-source platform for managing and monitoring MySQL, PostgreSQL and MongoDB performance, which developed by Percona.

Basically, PMM Server is built with Prometheus and Grafana. So we planned to integrate with AlertManager.

AlertManager Installation:

1.Download the Alert Manager:

Git Link: https://github.com/prometheus/alertmanager/releases

Download Link : https://prometheus.io/download/

$ wget https://github.com/prometheus/alertmanager/releases/download/v0.18.0/alertmanager-0.18.0.linux-amd64.tar.gz

2.Extract the Alert Manager Binary Files:

$ tar xvzf alertmanager-0.18.0.linux-amd64.tar.gz
alertmanager-0.18.0.linux-amd64/
alertmanager-0.18.0.linux-amd64/amtool
alertmanager-0.18.0.linux-amd64/alertmanager
alertmanager-0.18.0.linux-amd64/LICENSE
alertmanager-0.18.0.linux-amd64/alertmanager.yml
alertmanager-0.18.0.linux-amd64/NOTICE

3.Copy the executable files to /usr/local/bin

cp amtool alertmanager /usr/local/bin

4.Copy the config file to /etc/

$ mkdir -p /etc/alertmanager
$ cp alertmanager.yml /etc/alertmanager

5.Create a folder for the AlertManager data directory

$ mkdir -p /data/alertmanager

6. Create a separate user for Alert Manager

$ useradd -rs /bin/false alertmanager

7. Give Permission for Binary files, Config files and data folder

$ chown alertmanager:alertmanager /usr/local/bin/amtool /usr/local/bin/alertmanager
$ chown -R alertmanager:alertmanager /data/alertmanager /etc/alertmanager/*

8. Create a systemd service if your os supports

$ cat > /lib/systemd/system/alertmanager.service
[Unit]
Description=Alert Manager
Wants=network-online.target
After=network-online.target

[Service]
Type=simple
User=alertmanager
Group=alertmanager
ExecStart=/usr/local/bin/alertmanager \
  --config.file=/etc/alertmanager/alertmanager.yml \
  --storage.path=/data/alertmanager

Restart=always

[Install]
WantedBy=multi-user.target

9. Enable and Start the AlertManager service

$ systemctl enable alertmanager
$ systemctl start alertmanager

Now Alert Manager is started.

10. Verify whether AlertManager running or not

$ systemctl status alertmanager
● alertmanager.service - Alertmanager
   Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: disabled)
   Active: active (running) since Sun 2020-01-26 10:12:14 UTC; 2 months 14 days ago
 Main PID: 34209 (alertmanager)
   CGroup: /system.slice/alertmanager.service
           └─34209 /usr/local/bin/alertmanager --config.file=/etc/alertmanager/alertmanager.yml --storage.path=/data/alertmanager

11. Checking AlertManager GUI.

If the AlertManager is installed in locally use the below links

Link: http://localhost:9093/#/alerts

Integrating PMM Prometheus with AlertManager:

1.Add the below configuration to prometheus.yml file

$ vi /etc/prometheus.yml
rule_files:
  - /etc/alert.rules.yml
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - localhost:9093

/etc/alert.rules.yml is the Prometheus rule config file, will be explaining more about prometheus rules below.

2. Restart the Prometheus to apply the changes.

$ systemctl restart prometheus

3. Make sure prometheus is working fine after changes.

$ systemctl status prometheus
● prometheus.service - Prometheus
   Loaded: loaded (/usr/lib/systemd/system/prometheus.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/prometheus.service.d
           └─override.conf
   Active: active (running) since Thu 2020-04-09 15:23:06 UTC; 1 day 1h ago
  Process: 5235 ExecStartPre=/usr/bin/sleep 10 (code=exited, status=0/SUCCESS)
 Main PID: 5238 (prometheus)
   CGroup: /system.slice/prometheus.service
           └─5238 /usr/sbin/prometheus --web.listen-address :9090 --config.file /etc/prometheus.yml

Now we have integrated Alert Manager with PMM Prometheus.

Adding Prometheus rules:

Prometheus rules can written based on the metrics collected by exporters. Rules should be defined based requirements or use-cases.

Here are some of generic rules for instance down and MySQL service down alerts

$ vi /etc/alert.rules.yml
groups:
- name: mysql
  rules:
  - alert: MySQLServiceDown
    expr: mysql_up==0
    labels:
      severity: warning
    annotations:
      description: 'MySQL is not running on the instance'
      summary: MySQL Service Down
  - alert: InstanceDown
    expr: up==0
    labels:
      severity: warning
    annotations:
      description: 'Instance stopped running'
      summary: Instance Down

We will be posting a separate post for MySQL alerts soon.

Validating the rule config:

Validate the rule before applying using promtool

$ promtool check rules /etc/alert.rules.yml
  Checking /etc/alert.rules.yml
  SUCCESS: 2 rules found

Reloading the rule config:

Whenever rule config is modified , Prometheus should be reloaded.

$ kill -HUP $(pidof prometheus)

AlertManager Rule Config:

In the AlertManager Rule Config, we will define the how the alerts should routed.It can be routed based different patten like Alert name, cluster name or label names.

In the below example, slack is the default router, alerts are grouped based on each instance. Here repeat interval is 1 hour, alert will repeated every hour if is not resolved. Then Instance Down and MySQL Down alerts with be routed to PagerDuty and Slow Query alerts will be routed to Slack based.

global:
  slack_api_url: 'https://hooks.slack.com/services/XXXXXXXXXXXXXXX/XXXXXXXXXXXXXXXXXX'

route:
  receiver: 'slack'
  group_by: ['instance']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 1h


  routes:
  - match_re:
      alertname: ^(Instance Down|MySQL Down)$
    receiver: pagerduty
  - match_re:
      alertname: ^(Slow Queries)$
    receiver: slack

receivers:
- name: 'slack'
  slack_configs:
    text: "<!channel> \nsummary: {{ .CommonAnnotations.summary }}\ndescription: {{ .CommonAnnotations.description }}"

- name: 'pagerduty'
  pagerduty_configs:
  - service_key: XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX

Refer the below links for more details on alert manager config details:

https://prometheus.io/docs/alerting/configuration/

Reloading AlertManager Config:

To apply new changes, AlertManager should reloaded.

$ kill -HUP $(pidof alertmanager)

Note : This is works only on EC2 Marketplace Percona Monitoring and Management Version. Soon we will be posting how to integrate Prometheus Alert Manager with PMM Docker Version.