Extending the Tungsten Replicator Core JS Filter Functionality

Tungsten Replicator has a really cool feature in that we can filter data as it goes past on the wire.

The replicator itself is written entirely in Java and writing filters for it is not as straightforward as it looks, which is why the much better feature is just to use the JavaScript mechanism and write filters using that tool instead. I’ll save the details for how you can write filters to process and massage data for another time, but right now I wanted to find a good way of improving that JavaScript environment.

There are filters, for example, where I want to be able to load JSON option and configuration files, or write out JSON versions of information, and plenty more.

Mozilla’s Rhino JS environment is what is used to provide the internal JS environment for running filters. The way this is supported is that rather than creating a Rhino JS environment that can do whatever it wants, instead, we create a JS instance specifically for executing the required functions within the filter. One of these instances is created for each filter that is configured in the system (and each batch instance too).

The reason we do this is because for each filter, we want each transaction event that appears in the THL log to get executed through the JS instance where the filter() function in the JS filter is executed with a single argument, the event data.

The limitation of this model is that we dont get the full Rhino environment because we execute the JS function directly, so certain top level items and functions like load() or require(), or utilities like JSON.stringify() are not available. We could do that by changing the way we do the configuration, but that could start to get messy quickly, while also complicating the security aspects of how we execute these components.

There are some messy ways in which we could get round this, but in the end, because I also wanted to add some general functionality into the filters system shared across all JS instances, I chose instead to just load a set of utility functions, written in JavaScript, into the JS instance for the filter. The wonderful thing about JS is that we can write all of the functions in JS, even for classes, methods and functions that aren’t provided elsewhere.

So I chose the path of least resistance, which means loading and executing a core JS file before loading and executing the main filter JS so that. We can place into that JS file all of the utility functions we want to be available to all of the filters.

So, to enable this the first thing we do is update the core Java code when we load the filter JS to load our core utility JS first. That occurs in replicator/src/java/com/continuent/tungsten/replicator/filter/JavaScriptFilter.java, within the prepare() function which is where we instantiate the JS environment based on the code.

String coreutilssrc = properties.getString(“replicator.filter.coreutils”);

// Import the standard JS utility script first
try
 {
 // Read and compile the core script functions
 BufferedReader inbase = new BufferedReader(new FileReader(coreutilssrc));
 script = jsContext.compileReader(inbase, scriptFile, 0, null);
 inbase.close();

 script.exec(jsContext, scope);
 }
catch (IOException e)
 {
 throw new ReplicatorException("Core utility library file not found: "
 + coreutilssrc, e);
 }
catch (EvaluatorException e)
 {
 throw new ReplicatorException(e);
 }

This is really straightforward, we obtain the path to the core utilities script from the configuration file (we’ll look at how we define that later), and then compile that within the jsContext object, where our JavaScript is being executed. We add some sensible error checking, but otherwise this is simple.

It’s important to note that this is designed to load that core file *before* the main filter file just in case we want to use anything in there.

Next, that configuration line, we can add into a default config by creating a suitable ‘template’ file for tpm, which we do by creating the file replicator/samples/conf/filters/default/coreutils.tpl. I’ve put it into the filters section because it only applies to filter environments.

The content is simple, it’s the line with the location of our core utility script:

# Defines the core utility script location
replicator.filter.coreutils=${replicator.home.dir}/support/filters-javascript/coreutils.js

And lastly, we need the script itself, replicator/support/filters-javascript/coreutils.js :

// Core utility JavaScript and functions for use in filters
//
// Author: MC Brown (9af05337@opayq.com)


// Simulate the load() function to additional external JS scripts

function load(filename) {
    var file = new java.io.BufferedReader(new java.io.FileReader(new java.io.File(filename)));

    var sb = "";
    while((line = file.readLine()) != null)
        {
            sb = sb + line + java.lang.System.getProperty("line.separator");
        }

    eval(sb);
}

// Read a file and evaluate it as JSON, returning the evaluated portion

function readJSONFile(path)
{
    var file = new java.io.BufferedReader(new java.io.FileReader(new java.io.File(path)));

    var sb = "";
    while((line = file.readLine()) != null)
        {
            sb = sb + line + java.lang.System.getProperty("line.separator");
        }

    jsonval = eval("(" + sb + ")");

    return jsonval;
}

// Class for reoncstituing objects into JSON

JSON = {
    parse: function(sJSON) { return eval('(' + sJSON + ')'); },
    stringify: (function () {
      var toString = Object.prototype.toString;
      var isArray = Array.isArray || function (a) { return toString.call(a) === '[object Array]'; };
      var escMap = {'"': '\\"', '\\': '\\\\', '\b': '\\b', '\f': '\\f', '\n': '\\n', '\r': '\\r', '\t': '\\t'};
      return function stringify(value) {
        if (value == null) {
          return 'null';
        } else if (typeof value === 'number') {
          return isFinite(value) ? value.toString() : 'null';
        } else if (typeof value === 'boolean') {
          return value.toString();
        } else if (typeof value === 'object') {
          if (typeof value.toJSON === 'function') {
            return stringify(value.toJSON());
          } else if (isArray(value)) {
            var res = '[';
            for (var i = 0; i < value.length; i++)
              res += (i ? ', ' : '') + stringify(value[i]);
            return res + ']';
          } else if (toString.call(value) === '[object Object]') {
            var tmp = [];
            for (var k in value) {
              if (value.hasOwnProperty(k))
                tmp.push(stringify(k) + ': ' + stringify(value[k]));
            }
            return '{' + tmp.join(', ') + '}';
          }
        }
        return '"' + value.toString() + '"';
      };
    })()
  };

For the purposes of validating my process, there are three functions:

  • load() – which loads an external JS file and executes it, so that we can load other JS scripts and libraries.
  • readJSONFile() – which loads a JSON file and returns it as a JSON object.
  • JSON class – which does two things, one is provides  JSON.parse() method for parsing strings as JSON objects into JS objects and the other is JSON.stringify() which will turn a JS object back into JSON

Putting all of this together gives you a replicator where we now have some useful functions to make writing JavaScript filters easier. I’ve pushed all of this up into my fork of the Tungsten Replicator code here: https://github.com/mcmcslp/tungsten-replicator/tree/jsfilter-enhance

Now, one final note. Because of the way load() works, in terms of running an eval() on the code to import it, it does mean that there is one final step to make functions useful. To explain what I mean, let’s say you’ve written a new JS filter using the above version of the replicator.

In your filter you include the line:

load("/opt/continuent/share/myreallyusefulfunctions.js");

Within that file, you define a function called runme():

function runme()
{
     logger.info("I'm a bit of text");
}

Now within myreallyusefulfunctions.js I can call that function fine:

runme();

But from within the JS filter, runme() will raise an unknown function error. The reason is that we eval()‘d the source file within the load() function, and so it’s context is wrong.

We can fix that within myreallyusefulfunctions.js by exporting the name explicitly:

if (runme.name) this[runme.name] = runme;

This points the parent namespace to the runme() in this context, and we put that at the end of myreallyusefulfunctions.js script and everything is fine.

I’m lazy, and I haven’t written a convenient function for it, but I will in a future blog.

Now we’ve got this far, let’s start building some useful JS functions and functionality to make it all work nicely…

How to Buffer posts+hashtags from your Blog using Zapier

I try to automate as much my life as possible, particularly when it comes to computers.

I’ve been using the automated ‘Social Sharing’ on WordPress.com (and indeed, my blogs in general) for years. However, I’m also a keen Buffer user and WordPress.com does not offer a Buffer connection. Because I also use Buffer to handle my Patreon posts, concentrating them all in one place would make things a lot easier.

What I wanted to do was something quite straightforward, I wanted to turn a blog post entry into post to Twitter (and others) that turned the list of tags I created on the post into #hashtags. This actually doesn’t seem like a particularly complex or uncommon request, but apparently it’s not a standard offering. What I was even more surprised at was that nobody else seemed to have done the same, which has me confused…

Now there are many options for doing this kind of automated posting, I could have used IFTTT, but IFTTT while incredibly useful (I have about 60 recipes on there) is also incredibly simplistic and your options are limited. That means I can’t post from WordPress to Buffer with the required hashtags.

Zapier is very similar to IFTTT, but also has the option of running multistep Zaps that do more than one thing (IFTTT is limited to one target), but better than that you can include a step that runs information through a JavaScript (or Python) script to do some additional processing.

And this is the key that enables me to do precisely what I need, take a blog post from one of my blogs, process the list of tags into a list of (de-duplicated) hashtags, and then post it into my Buffer queues.

So, here’s how to get Zapier to do what you need, there are going to be five steps to this:

  1. Identify when a new post appears on a WordPress blog
  2. Run a short Javascript program to take the list of tags (actually Terms) from the Blog post into a deduced and hash tagged version
  3. Add it to my Twitter Buffer
  4. Add it to my Facebook Buffer
  5. Add it to my LinkedIn Buffer

Here’s how to get it setup. I’m going to assume you know Zapier and can follow the onscreen instructions, it’s not that complex.

Step 1

  • Register for a Zapier account, if you don’t already have one.
  • Connect your Zapier account to your WordPress blog
  • Connect your Zapier account to your Buffer account

Step 2

Create a new Zap on Zapier.

Select ‘Wordpress’ as your trigger app.

Screenshot 2016-02-21 13.45.18.png

Now configure how you want the trigger to occur. I basically every post in every category, but if you want to add specific categories or other filtering, feel free.

Step 3

For the Action select ‘Code </>’

Screenshot 2016-02-21 13.45.28.png

Now Select ‘Javascript’

Screenshot 2016-02-21 13.45.34.png

When it gets to the Edit Template, you’ll need to specify the input variable to the JavaScript, in this case, create one called ‘tags’ and then select the ‘Terms Name’ from WordPress Step 1 and you’ll be ready to go.

Screenshot 2016-02-21 13.45.40.png

These variables that you select here are placed into a hash (associative array) in the JavaScript context called ‘input’, so in this case, we’ll have the item ‘input.tags’ to parse in our JavaScript code. The actual list of terms will come through as a comma-separated string

The code itself is quite straightforward:

var hashlist = {};

input.tags.split(',').forEach(function(item,index)
{
  var res = item.replace(/ /g,'');
  res = res.toLowerCase();
  res = '#' + res;
  hashlist[res] = 1;
});
return({'hashlist' : Object.keys(hashlist).join(' ')});

We iterate over the terms by using ‘split’ to separate by a comma, then we replace any spaces with nothing (so we turn things like ‘data migration’ to ‘datamigration’, convert it to lower case, add the # prefix and add that all to a new associative array. The reason for this is to get rid of duplicates, so even if we have ‘data migration’ and ‘datamigration’ in the input, we only get one in the output. This is particularly useful because the ‘Terms’ list from WordPress is actually composed of both the tags and the categories for each post.

Finally, we return all of that as a string with all the keys of the hash (ie. our nicely formatted tags) separated by a space. However, just like the input value, we return this as an Object with the string assigned to the field ‘hashlist’. We’ll need this when creating the Buffer post.

I recommend you test this thoroughly and make sure you check the output.

Step 4

Choose your target Buffer.

The Buffer API only allows you to post to one queue at a time, but brilliantly, Zapier lets us add multiple steps and so we can do one for each Buffer queue, in my case, the three. The benefit of this is that I can customise and tune the text and format for each. So, for example, I could omit the tags on Facebook, or, as I do, give a nice intro to the message ‘Please read my new blog post on…’ on FB because I’m not character (or attention span) limited.

Now for each Buffer queue, create your message, and when it comes to choosing the output, make sure you select your JavaScript output (which will be Step 2) and the ‘hashlist’ value.

Step 5

That’s, it! Test it, make sure your posts are appearing, and check your Buffer queue (deleting the entries if required so you don’t double-post items).

You can duplicate and use this as many times as you like, in fact I’ve done this across my two blogs and am now looking into where else I can use the same method.

2015 is that way, 2016 is this way

The last year has been something of a change in direction in my life. Not only was it a year of a large number of ‘firsts’ for me, in all sorts of ways, I also changed a lot of what I was doing to better suit me. Actually that’s really important.

2015 turned out to be a really significant year for me, not because of any huge life changes, but because so many different and interesting things happened to me

What did I change?

‘Official’ Studying – I have for many years been doing a degree in Psychology with the Open University. I was actually on my last year – well, 20 months as it was part time. I had my final two modules to go, and although I was hugely enjoying the course, it was a major sap on my personal time; what little I have of it after work and other obligations (see below). I also reached a crunch point; due to the way the course worked, changes in the rules, and the duration of the work (I started studying back in 2007), I had to finish the course by June 2016, and that meant there were no opportunities for retakes or doing the entire course all over again. I either had to get it right, first time, for each remaining course, or I would have to start again. That kept the pressure on me to get good marks massive when I have a very busy day job, and it got harder to dedicate the required time. In the end I decided that having the piece of paper was less important than having the personal interest in the topic. And that was the other of problem. I’d already stopped reading, I stopped playing games, I stopped going out, all to complete a course. I realised that my interest in Psychology wont disappear just because I stop studying. I can still read the books, magazines, articles that interest me without feeling pressured to do so.

Book/Article Writing – Given the above, the lack of activity on here, it wont surprise you that writing books and articles was something else I stopped. I deliberately changed my focus to the Psychology degree. But I also stopped doing anything outside work in any of the areas I’m interested in, despite some offers. I was working on a book, actually two books, but ultimately dropped them due to other pressures. Hopefully I’ll be converting some of that material into posts here over the course of the year.

Working Hours – I have very strange sleep patterns; I sleep very little, and have done since the day I was born. As such that means I normally get up very early (2am is not unusual) having gone to bed at 10 or 11pm the previous night. However, last I spent even more time up late on the phone with meetings and phone calls to people in California. That would make for a long day, so I switched my day entirely so that I now start working later and finish later, doing most of my personal stuff in the early morning. It’s nice and quiet then to.

2015 Firsts

  • First time staying in a B&B – I know, this seems like an odd, but I have honestly never stayed in a B&B before. But I did, three times, while on a wonderful touring holiday of the North of Scotland, taking in Inverness, Skye, Loch Ness and many other places.
  • First touring holiday (road trip) – See above. For the first time ever, I didn’t go to one place, stay there, and travel around the area. We drove miles. In fact, I did about 2,800 over the course of a week.
  • First time to the very north of Scotland – Part of the same road trip. I’ve done Dunbar, North Berwick, the borders, Edinburgh.
  • First music concert (in ages) – I went to two, in fact. One in Malaga and one in San Francisco about two weeks later. Enjoyed both. Want to do more.
  • First time driving in the US – I’ve been regularly going to the US since 2003, when I first started working Microsoft, and even for companies in Silicon Valley, I’ve always taken rides from friends, or taxis. In April, I hired a car and drove around. A lot. I did about 600 miles over the course of two weeks.
  • First Spanish train journey – I flew to Madrid on business, and then took the train from there down to see a friend in Malaga. The AVE train is lovely, and a beautiful way to travel, especially at 302km/h.
  • First Cruise – I’ve wanted to go on a cruise to see the Fjords of Norway since I was a teenager. I love the cold, I love the idea of being relatively isolated on a boat with lots of time to myself. In the end, I spent way more time interacting with other people than I expected, and did so little on my own, but I wouldn’t have changed it for the world. I went from Bergen to Kirkenes in the Arctic circle and back on the Hurtigruten and it was one of the most amazing trips of my life.
  • First time travelling on my own not for business – I travel so much for work (I did 16 journeys in 2015, most to California) it made a nice, if weird, change to do s full trip on my own. I enjoyed it immensely and recommend it to everybody.

What’s planned for 2016?

I’m starting to publish my fictional work on Patreon with the express intention of getting book content that I’ve been working on for many many years out there in front of other people. I’ve got detailed notes and outlines on about nine different fictional titles, crossing a range of different genres. I’ve started with two of my larger ‘worlds’ – NAPE and Kings Courier and will be following up with regular chapters and content over the coming months.

I’ve also created a new blog to capture all of my travel. Not the work stuff, but things like the Scotland tour and the Norwegian Cruise, plus whatever else comes up this year and beyond. Current thoughts are Antartica, Alaska or Iceland, work and personal commitments permitting. Plus I’m in Spain in August with my family and friends.

Converting my unfinished technical books to blog posts. I’ve worked on a number of books, some of which contain fresh, brand new material I’d like to share with other people, including the book content I was working on last year. I’m still trying to reformat it for the blog so that it looks good, but I will get there.

Office 365 Activation Wont Accept Password

So today I signed up for Office 365, since it seemed to be the easiest way to get hold of Office; although I have a license and subscription, I also have more machines.

To say I was frustrated when I tried to activate Office 365 was an understatement. Each time I went through the process, it would reject the password saying there was a problem with my account.

I could login with my email and password online, but through the activation, no dice. Some internet searches, including with the ludicrously bad Windows support search didn’t elicit anything useful.

Then it hit me. Office 2011 for Mac through an Office 365 subscription probably doesn’t know about secondary authentication.

Sure enough, I created and application specific password, logged in with that, and yay, I now have a running Office 365 subscription.

If you are experiencing the same problem, using a application specific password might just help you out.

Passion for Newspaper Comics? Watch Stripped

I’m a big fan of comics – and although I am a fan of Spiderman, Superman, and my personal favourite, 2000AD – what I’m really talking about is the newspaper comics featuring stars like Garfield, Dilbert, and Calvin and Hobbes.

Unfortunately being in the UK, before the internet existed in it’s current form, finding these comics, particularly from the US was difficult. We don’t have many US comics in UK newspapers, and to be honest, very few papers in the UK have a good variety of any comic. That made feeding the habit difficult, as I would trawl, literally, around bookstores in the humour section to find the books I needed.

Garfield was my first foray into the market, and I bought one of the first books not long after it came out. Then, as I started looking around a little more I came across others, like Luann, For Better or For Worse, before finding the absolute joy that was Calvin and Hobbes before ultimately getting hold of Foxtrot, Sherman’s Lagoon and many many more.

Of course, the Internet has made these hugely accessible, and indeed not only do I read many comics every day, but I very deliberately subscribe (and by that, I mean pay money) to both Comics Kingdom (43 daily comics in my subscription) and GoComics.com (72 daily comics) I also continue to the buy the books. Why?

Because at the end of the today looking at screens and taxing the brain, what I really want to do is chill and read some still intelligent, but not mentally taxing, content, and that means reading my comic books. They give me a break and giggle and I find that a nice way to go to sleep.

The more important reason, though, is because I enjoy these comics and believe these people should be rewarded for their efforts. Honestly, these guys work their laughter muscles harder than most people I know, creating new jokes, every day, that make me laugh. They don’t just do this regularly, or even frequently. They do it *every day*.

As a writer I know how hard it is to create new content every day, and keep it interesting. I cannot imagine how hard it is to keep doing it, and making it funny and enjoyable for people to read.

Over the years, I’ve also bought a variety of interesting things, including the massive Dilbert, Calvin & Hobbes and Far Side collectibles. I own complete collections of all the books for my favourite authors, and I’ve even contacted the authors directly when I haven’t been able to get them from the mighty Amazon. To people like Hilary B Price (Rhymes with Orange), Tony Carillo (F-Minus), Scott Hilburn (The Argyle Sweater), Leigh Rubin (Rubes) and Dave Blazek (Loose Parts) I thank you for your help in feeding my addiction. To Mark Leiknes (the now defunct Cow & Boy), I thank you for the drawings from your drawing board and notebook, and I’m Sorry it didn’t work out.

But to Dave Kellett & Fred Schroeder I owe a debt of special gratitude. Of course Dave Kellett writes the excellent Sheldon, and not only do I have the full set, Dave signed them first. I’ve also got one of the limited editions Arthur’s…

But together, they produced the wonderful Stripped! which I funded through Kickstarter along with so many others (you can even see my name in the credits!). If you have any interest in how comics are drawn, where the ideas come from, and how difficult the whole process is, you should watch it. Even more, you should watch it if you want to know what these people look like.

Comic artists are people who for some people we don’t even know their name, but for some we might know, but probably very few who we ever get to see what they look like. Yet these people are superstars. Really. Think about it, they write the screenplay, direct it, produce it, provide all the special effects, act all the parts, and do all the voices. And despite wearing all of these different hats, every day, they can still be funny, and, like all good comedy, thought provoking.

For me there is one poignant moment in the film too. Understanding how, in a world where newspapers and comic syndication is dwindling fast, how these people expect to make a living. The Internet is a great way for comic artists to get exposure to an ever growing army of fans, but I think there’s going to be an interesting crossover period for those comics that started out in the papers.

The film itself is great. Not only do you get to see these comic artist gods, but you get to understand their passion and interest, and why they do what they do. That goes a long way to helping you empathise with them and their passion in line with you and your passion – reading them.

If you like comics, find a way of giving some money back to these people, whether it’s a subscription, buying their books or buying merchandise.

 

Revisiting ZFS and MySQL

While at Percona Live this year I was reminded about ZFS and running MySQL on top of a ZFS-based storage platform.

Now I’m a big fan of ZFS (although sadly I don’t get to use it as much as I used to after I shutdown my home server farm), and I did a lot of different testing back while at MySQL to ensure that MySQL, InnoDB and ZFS worked correctly together.

Of course today we have a completely new range of ZFS compatible environments, not least of which are FreeBSD and ZFS on Linux, I think it’s time to revisit some of my original advice on using this combination.

Unfortunately the presentations and MySQL University sessions back then have all been taken down. But that doesn’t mean the advice is any less valid.

Some of the core advice for using InnoDB on ZFS:

  • Configure a single InnoDB tablespace, rather than configuring multiple tablespaces across different disks, and then let ZFS manage the underlying disk using stripes or mirrors or whatever configuration you want. This avoids you having to restart or reconfigure your tablespaces as your data grows, and moves that out to ZFS which can do it much more easily and while the filesystem and database remain online. That means we can do:
innodb_data_file_path = /zpool/data/ibdatafile:10G:autoextend
  • While we’re taking about the InnoDB data files, the best optimisation you can do is to set the ZFS block size to match the InnoDB block size. You should do this *before* you start writing data. That means creating the filesystem and then setting the block size:
zfs set recordsize=8K zpool/data
  • What you can also do is configure a separate filesystem for the InnoDB logs that has a ZPool record size of 128K. That’s less relevant in later versions of ZFS, but actually it does no harm.
  • Switch on I/O compression. Within ZFS this improves I/O time (because less data is read/written physically from/to disk), and therefore improves overall I/O times. The compression is good enough and passive to be able to handle the load while still reducing the overall time.
  • Disable the double-write buffer. The transactional nature of ZFS helps to ensure the validity of data written down to disk, so we don’t need two copies of the data to be written to ensure valid recovery in the case of failure that are normally caused by partial writes of the record data. The performance gain is small, but worth it.
  • Using direct IO (O_DIRECT in your my.cnf) also improves performance for similar reasons. We can be sure with direct writes in ZFS that the information is written down to the right place. EDIT: Thanks to Yves, this is not currently supported on Linux/ZFS right now.
  • Limit the Adjustable Replacement Cache (ARC); without doing this you can end up with ZFS using a lot of cache memory that will be better used at the database level for caching record information. We don’t need the block data cache as well.
  • Configure a separate ZFS Intent Log (ZIL), really a Separate Intent Log (SLOG) – if you are not using SSD throughout, this is a great place to use SSD to speed up your overall disk I/O performance. Using SLOG stores immediate writes out to SSD, enabling ZFS to manage the more effective block writes of information out to slower spinning disks. The real difference is that this lowers disk writes, lowers latency, and lowers overall spinning disk activity, meaning they will probably last longer, not to mention making your system quieter in the process. For the sake of $200 of SSD, you could double your performance and get an extra year or so out the disks.

Surprisingly not much has changed in these key rules, perhaps the biggest different is the change in price of SSD between when I wrote these original rules and today. SSD is cheap(er) today so that many people can afford SSD as their main disk, rather than their storage format, especially if you are building serious machines.

Tungsten Replicator 3.0 is Cloudera Enterprise 5 Certified

One of the key platforms I’ve been testing on for the MySQL to Hadoop replication has been Cloudera, largely driven by customer requirements, but it’s also one of the easiest way to get started with Hadoop.

logo_cloudera_certified

What I’m even more pleased about is the fact that we are proud to announce that Tungsten Replicator 3.0 is certified for use on the new Cloudera Enterprise 5 platform. That means that we’re sure that replicating your data from MySQL to Cloudera 5 and have it work without causing problems or difficulties on the Hadoop loading and materialisation.

Cloudera is a great product, and we’re very happy to be working so effectively with the new Cloudera Enterprise 5. Cloudera certainly makes the core operation of managing and monitoring your Hadoop cluster so much easier, while still providing core functionality from the Hadoop family like Hive, HBase and Impala.

What I’m really interested in is the support for Spark, which will allow much easier live-querying and access to data.  That should make some data processing and live data views much easier to build and query further down the line.