Skip to content

Releases: the4thdoctor/pg_chameleon

v1.5

23 Jul 10:54
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.4

  • fix wrong evaluation in table's consistent state (row and DDL)
  • fix wrong dimensions when building floating point data types
  • add support for DEFAULT value on ADD COLUMN,CHANGE,MODIFY
  • add indices to the t_log _replica tables in order to speedup the batch cleanup
  • Fix for Issue #5 add cast to char with character set in column generation in order to override collation mix. the charset is the mysql general character set
  • Improve logging messages to be more descriptive
  • Remove total_events and evaluate when writing the batch using the real number of events stored from the mysql replica
  • fix ALTER TABLE...CHANGE parsing to lower the data type
  • add warning when a table is without primary key
  • Fix Issue #9 add configuration check for MySQL before starting the replica
  • Fix Issue #10 clarify running replica log messages to describe better what's happening
  • Add --nolock option as per request on Issue #13 . When started with --nolock the replica doesn't create the lock file in case of error.

v1.4

08 Jul 11:03
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica replaced by sync_tables

The command sync_replica is now replaced by sync_tables as this new name better reflects the concept behind the process.
The command requires the option --table followed by a comma separated list of table names.

If the specified table is not present in the origin's schema the table is silently skipped.
When a table is synchronised the existing copy in the target database is dropped and recreated from scratch.
In order to get the table in consistent state the log coordinates are saved in the the t_replica_tables.
The replica process will ignore the table until the log position reaches the table's snapsot position,
ensuring a consistent state for the replica target.

Changelog from 1.3.1

  • add varbinary to the type conversion dictionary
  • fix wrong quoting when key field is surrounded by backtick `
  • add geometry to the supported types
  • add varbinary and geometry to hexify in config-example.yaml
  • add INDEX and UNIQUE to excluded words when parsing alter table. this prevents the ddl replica to crash when the alter table adds an index
  • Fix for Issue #4 add generic exception when fallback on inserts to trap unexpected data not handled by psycopg2
  • Replace sync_replica with sync_tables. Check the release notes for implementation.
  • Add --version to display the program version.
  • Move documentation on pgchameleon.org

Release v1.3.1

19 Jun 19:57
Compare
Choose a tag to compare

Bugfix emergency release v1.3.1

I discovered a regression when running the init_replica caused by a wrong handling of missing time in master coordinates. Sorry about that.

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica disabled

The sync_replica command do not work as expected when running in single table mode.
As the issue requires time to be fixed this release temporarly disables the sync_replica command.

Change in replica storage

The initial implementation for the relay data was to have two log tables t_log_replica_1 and t_log_replica_2 with the
replica process accessing one table at a time.

This approach allows autovacuum to take care of the unused partition meanwhile the other is written.
The method worked fine with only one replica worker. However as the flip flop between the tables is calculated indepentently
for each source this could cause unwanted bloat on the log tables if several sources are replicating all togheter.
In this scenario autovacuum will struggle to truncate the empty space in the table's end.

The pg_chameleon version 1.3 implements the log tables per source. Each source have a dedicated couple of tables still inherited from
the root partition t_log_replica.

The schema is migrated at the first run after the upgrade by pg_chameleon's integrated schema migrator.
The upgrade scripts are installed in the python specific site-packages directory.

For example if have a python 3.6 virtualenv in the directory ~/venv you'll find the upgrade files in
~/venv/lib/python-3.6/site-packages/pg_chameleon/sql/upgrade/

The migration performs the following operations.

  • add a field v_log_table to t_sources
  • add an helper plpgsql function fn_refresh_parts() which creates the source's log tables if not present
  • with a DO block creates the new log tables for the existing sources and copies the data from the old t_log_replica_x to the new log tables
  • drops the old log tables
  • removes the field v_log_table from t_replica_batch

Upgrade

please read carefully before attempting any upgrade

The schema upgrade happen automatically at the first run.
Because this one involves a data copy could take more time than the usual. If the process seems frozen do not stop it otherwise you may lose your replica setup .

Upgrade steps

  • Stop all the replica sources. The show_status command must show all the rows in stopped status
  • Take a backup of the schema sch_chameleon with pg_dump
  • Upgrade pg_chameleon with pip install pg_chameleon --upgrade
  • Run chameleon.py upgrade_schema --config <your_config> --debug
  • When the upgrade is finished start the replica process as usual

Rollback

If something goes wrong in the upgrade process you shall restore the sch_chameleon's backup,
Then you should downgrade the installation to pg_chameleon 1.2 and start the replica as usual.

Changelog from 1.2

  • each source now uses two dedicated log tables for better performance
  • set descriptive application_name in postgres process to track the replica activity
  • fix race condition when two sources have a batch with the same value in t_replica_batch.ts_created
  • add switch --debug for having the log on console with DEBUG verbosity without need to change configuration file
  • fix regexp for foreign keys when omitting CONSTRAINT keyword
  • change lag display in show_status from seconds to interval for better display
  • add quote_col method in sql_token class to ensure all index columns are correctly quoted
  • add a release_notes file for the details on the releases 1.3+
  • fix wrong timestamp save when updating the last received event in sources
  • temporarly disable sync_replica because is not working as expected

Release v1.3

17 Jun 08:11
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

sync_replica disabled

The sync_replica command do not work as expected when running in single table mode.
As the issue requires time to be fixed this release temporarly disables the sync_replica command.

Change in replica storage

The initial implementation for the relay data was to have two log tables t_log_replica_1 and t_log_replica_2 with the
replica process accessing one table at a time.

This approach allows autovacuum to take care of the unused partition meanwhile the other is written.
The method worked fine with only one replica worker. However as the flip flop between the tables is calculated indepentently
for each source this could cause unwanted bloat on the log tables if several sources are replicating all togheter.
In this scenario autovacuum will struggle to truncate the empty space in the table's end.

The pg_chameleon version 1.3 implements the log tables per source. Each source have a dedicated couple of tables still inherited from
the root partition t_log_replica.

The schema is migrated at the first run after the upgrade by pg_chameleon's integrated schema migrator.
The upgrade scripts are installed in the python specific site-packages directory.

For example if have a python 3.6 virtualenv in the directory ~/venv you'll find the upgrade files in
~/venv/lib/python-3.6/site-packages/pg_chameleon/sql/upgrade/

The migration performs the following operations.

  • add a field v_log_table to t_sources
  • add an helper plpgsql function fn_refresh_parts() which creates the source's log tables if not present
  • with a DO block creates the new log tables for the existing sources and copies the data from the old t_log_replica_x to the new log tables
  • drops the old log tables
  • removes the field v_log_table from t_replica_batch

Upgrade

please read carefully before attempting any upgrade

The schema upgrade happen automatically at the first run.
Because this one involves a data copy could take more time than the usual. If the process seems frozen do not stop it otherwise you may lose your replica setup .

Upgrade steps

  • Stop all the replica sources. The show_status command must show all the rows in stopped status
  • Take a backup of the schema sch_chameleon with pg_dump
  • Upgrade pg_chameleon with pip install pg_chameleon --upgrade
  • Run chameleon.py upgrade_schema --config <your_config> --debug
  • When the upgrade is finished start the replica process as usual

Rollback

If something goes wrong in the upgrade process you shall restore the sch_chameleon's backup,
Then you should downgrade the installation to pg_chameleon 1.2 and start the replica as usual.

Changelog from 1.2

  • each source now uses two dedicated log tables for better performance
  • set descriptive application_name in postgres process to track the replica activity
  • fix race condition when two sources have a batch with the same value in t_replica_batch.ts_created
  • add switch --debug for having the log on console with DEBUG verbosity without need to change configuration file
  • fix regexp for foreign keys when omitting CONSTRAINT keyword
  • change lag display in show_status from seconds to interval for better display
  • add quote_col method in sql_token class to ensure all index columns are correctly quoted
  • add a release_notes file for the details on the releases 1.3+
  • fix wrong timestamp save when updating the last received event in sources
  • temporarly disable sync_replica because is not working as expected

Release v1.2

25 May 04:43
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.1

  • fix deadlock when replicating from multiple sources:
  • add source id when cleaning batches
  • add missing marker when outputting failed mysql query in copy_tabl_data
  • fix wrong decimal handling in build_tab_ddl
  • add bool to the data dictionary
  • exclude ddl when coming from schemas different from the one replicated
  • fix wrong quoting when capturing primary key inline
  • add error handling in read_replica
  • move the version view management in the pg_engine code

Release v1.1

13 May 14:05
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0

  • completed docstrings in sql_util.py
  • fix race condition when capturing queries not tokenised that leave the binglog position unchanged

The race condition fixed in the 1.1 affects servers with low write activity and using the temporary tables.
The following statement executed on mysql with the version 1.0 will cause the read_replica function restarting from the binlog position marked by the CREATE table.

DROP TABLE IF EXISTS test;
CREATE TABLE test(
  id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  value1 VARCHAR(45) NOT NULL,
  PRIMARY KEY  (id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
CREATE TEMPORARY TABLE tmp_test(
  id SMALLINT UNSIGNED NOT NULL AUTO_INCREMENT,
  value1 VARCHAR(45) NOT NULL,
  PRIMARY KEY  (id)
)ENGINE=InnoDB DEFAULT CHARSET=utf8;
insert into tmp_test (value1) values('blah'),('blah');
insert into test (value1) values('blah');
DROP TEMPORARY TABLE if exists tmp_test ;

This is caused by the batch not marked as closed when the query is received. The batch was closed only when the query were parsed. The statements CREATE TEMPORARY TABLE are not parsed (by design) and therefore the batch were not closed correctly. The subsequent row images generated by the inserts were read at each loop causing a replay issue because of the primary key violation.

Release v1.0

07 May 11:04
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0-RC.2

  • Completed docstrings in pg_lib.py

Release v1.0-RC.2

26 Apr 06:41
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0-RC.1

  • Completed docstrings in global_lib.py and mysql_lib.py
  • Partial writing for docstrings in pg_lib.py
  • Restore fallback on inserts when the copy batch data fails

Release v1.0-RC.1

15 Apr 06:34
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0 Beta 2

  • add support for primay key defined as column constraint
  • fix regression if null constraint is omitted in a primary key column
  • add foreign key generation to detach replica. keys are added invalid and a validation attempt is performed.
  • add support for truncate table
  • add parameter out_dir to set the output destination for the csv files during init_replica
  • add set tables_only to table_limit when streaming the mysql replica
  • force a close batch on rotate event if binlog changes without any row events collected
  • fixed replica regression with python 3.x and empty binary data
  • added event_update in hexlify strategy
  • add tabulate for nice display for sources/status
  • logs are rotated on a daily basis
  • removed parameter log_append
  • add parameter log_days_keep to specify how many days keep the logs
  • feature freeze

Release v1.0-beta.2

02 Apr 10:14
Compare
Choose a tag to compare

Pg_chameleon is a replication tool from MySQL to PostgreSQL developed in Python 2.7 and Python 3.3+
The system relies on the mysql-replication library to pull the changes from MySQL and covert them into a jsonb object.
A plpgsql function decodes the jsonb and replays the changes into the PostgreSQL database.

The tool requires an initial replica setup which pulls the data from MySQL in read only mode.
This is done by the tool running FLUSH TABLE WITH READ LOCK; .

The tool can pull the data from a cascading replica when the MySQL slave is configured with log-slave-updates.

Changelog from 1.0 Beta 1

  • add detach replica with automatic sequence reset (no FK generation yet)
  • replica speed improvement with the exclusion of BEGIN,COMMIT when capturing the queries from MySQL
  • fix the capturing regexp for primary keys and foreign keys
  • fix version in documentation