Uncategorized

Elasticsearch in AWS

If you’re using AWS and you need elasticsearch, you are standing in front of a choice.

  1. Use https://aws.amazon.com/elasticsearch-service/
  2. Just standup some nodes and run your own.

Which one should you choose? Well here’s some thoughts on the matter. I’ve been using, running and migrating data on elasticsearch for the last 2 years. So this should be sound judgement.

Firstly, let’s consider cost. Purely HW cost is very similar, in AWS managed mode you’re still paying for the same hardware as you would running your own. I haven’t see any premium for running ES on that kit.

Access controls. 

In standalone mode you have the usual security groups or VPC and all the richness of the usual AWS controls. In managed mode you don’t. This was surprising, but basically you can either permission individual credentials or whitelist IPs.

In the first case you need to sign each request. It’s rather inconvienient, although it works with python aws-request-auth.

In the second case you literally whitelist IPs, which will have access to your cluster. This sounds insane, but that’s the only sensible way to interact with ES. Because you can whitelist a small number of IPs running nginx forwarding your requests to ES. A very good howto can be found here.

And this is the option I would recommend – nginx with usual AWS controls forwarding your requests to ES. But of course you lose the load-balacing built into most clients.

Monitoring and management.

In standalone mode you have a variety of tools and plugins for monitoring and managing your data and your cluster. For me those are separate things. Both vitally important.

In managed mode you get some cluster management. I say some because AWS has done an excellent job at distilling all the good practices of running ES and captured the essense in a handful of choices. I loved it. It also has some metrics around the cluster, which was nice.

The management options I liked included:

  1. Separate data and master nodes
  2. Field data cache limits
  3. Automated periodic snapshots

These are all the things I would set on my own cluster straight away.

BUT And this is a biggie. In managed mode you have 0 and I mean no tools to help you manage your data. You’re stuck with plain REST. That’s a biggie – kopf or similar is invaluable at looking at your indices, changing your replicas, settings, templates, etc. All of that is missing. You can’t even run kopf someplace else because all REST calls it uses to configure itself are querying cluster state and are disabled. So it doesn’t even start :(

So you’re at the mercy of AWS to manage your data, upgrade your cluster, do many other things, which can be quite tricky. From my experience all of these bits require great care. For me that was a “walk-away” in negotiation terms. That killed all the niceness of bringing up a cluster in a few easy choices and a few clicks.

Clients.

In managed mode you’re only allowed REST. In standalone it’s whatever you like :)

Conclusion.

I like AWS managed elasticsearch. It’s a great product. Unfortunately it’s a bit raw. It’s also VERY slow at taking in configuration changes, including entitlements.

Too much is restricted without adding adequate alertnatives. So if you want a quick cluster to prototype something and you don’t want to manage it – it’s right for you. Otherwise I’d wait a bit and see where this goes.

Standard
Java/Maven

CopyOnWriteArrayList in Java

I am reading Java Concurrency in Practise. Priceless if you know a thing or two about concurrency already, but would like to brush up on the theory and also learn one or two new developments. The book is extremely well-written.

And here is a little something I would like to share. In the book Brian Goetz glides over this, but this is so amazing, that it deserves a deeper look.

// CopyOnWriteTest.java

import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
import java.util.concurrent.CopyOnWriteArrayList;

public class CopyOnWriteTest {

	private static void f( List l ) {
		l.add( 2 );
	}

	/**
	 * @param args
	 */
	public static void main(String[] args) {
		List copyOnWriteList = new CopyOnWriteArrayList();
		List simpleList = new ArrayList();
		copyOnWriteList.add( 1 );
		simpleList.add( 1 );
		Iterator copyOnWriteIterator1 = copyOnWriteList.iterator();
		Iterator simpleListIterator1 = simpleList.iterator();
		f( copyOnWriteList );
		Iterator copyOnWriteIterator2 = copyOnWriteList.iterator();
		Iterator simpleListIterator2 = simpleList.iterator();
		// debug point on the output line just below
		System.out.println(copyOnWriteList);
	}

}

And when we debug into it we see this amazing picture (comments in red on the right):

CopyOnWriteArrayList debug screen

CopyOnWriteArrayList debug screen

As you see the normal non-thread safe ArrayList iterator holds reference to the collection itself. Whereas the clever little CopyOnWrite iterator creates a clone of the underlying array every time a modification is made, so whenever an iterator is called it just gets the latest clone – which doesn’t change throughout the life of the iterator. Rather new clones are created.

Now there is another way of doing this – where we don’t abstract away the reference and therefore when the collection is modified in a function we don’t see that in the caller. But this is not true here and if you let the little example run, you will see it print [1, 2], which is brilliant! 2 is added in the function f.

I should point out that Vector, which is thread safe, looks exactly like the ArrayList here.

For those searching for uniqueness guarantees there is CopyOnWriteArraySet, which uses CopyOnWriteArrayList underneath, so all of the same features hold.

Links

Standard
Java/Maven, Programming

Testing threads with JUnit

Problem
Sometimes there is a need to test a simple piece of multi-threaded code. For example, how will your code behave, when two threads read and then modify one of the variables? Well, if you know your threads you can devise and code the scenario, so that thread access is exactly right. For example, by using a mutex or semaphore. But how do you assert that your test succeeds? I mean besides sitting next to it with a debug session and checking things on every step.

JUnit is a very nice framework, but it does not handle assertions within other threads as well as one would hope it could.

Solution
I will use JUnit 4.0, but this applies to earlier versions as well.

In the sample below we have two test cases, testCaseNaive() will try to do the thing that should work out of the box. However, as far as JUnit is concerned the test succeeds with flashing lights and shooting ribbons. The second test case, testCase2Threads(), uses a somewhat hackish trick. And this trick does involve you creating the two runnables right here in the test.

The idea is quite simple, assertions in JUnit propagate up through the stack by throwing an exception, which is then caught by the test runner. If we get to execute the last statement in the thread as planned – all is well, no assertions were violated, if not – something has gone wrong. And in the case of a fail we rely on the test runner to print out the exception.

// TestMyClass.java

import java.util.ArrayList;
import java.util.Vector;
import junit.framework.Assert;
import org.junit.Test;

public class TestMyClass {

	@Test public void testCaseNaive() throws InterruptedException{
		Runnable runnable1 = new Runnable() {
			@Override public void run() {
				Assert.fail();
			}			
		};
		Thread t1 = new Thread( runnable1 );
		t1.start();
		t1.join();
	}
	
	@Test public void testCase2Threads() throws InterruptedException {
		final ArrayList< Integer > threadsCompleted = new ArrayList< Integer >();
		Runnable runnable1 = new Runnable() {
			@Override public void run() {
				Assert.fail();
				threadsCompleted.add(1);
			}
		};
		
		Runnable runnable2 = new Runnable() {
			@Override public void run() {
				Assert.assertTrue( true );
				threadsCompleted.add(2);
			}
		};
		
		Thread t1 = new Thread( runnable1 );
		Thread t2 = new Thread( runnable2 );
		
		t1.start();
		t2.start();
		t1.join();
		t2.join();
		
		System.out.println( "Threads completed: " + threadsCompleted );
		Assert.assertEquals(2, threadsCompleted.size());
	}
}
// testCaseNaive() passes, but the output still contains this
Exception in thread "Thread-0" junit.framework.AssertionFailedError: null
	at junit.framework.Assert.fail(Assert.java:47)
	at junit.framework.Assert.fail(Assert.java:53)
	at TestMyClass$1.run(TestMyClass.java:13)
	at java.lang.Thread.run(Unknown Source)
// testCase2Threads() fails and the output is:
Exception in thread "Thread-1" junit.framework.AssertionFailedError: null
	at junit.framework.Assert.fail(Assert.java:47)
	at junit.framework.Assert.fail(Assert.java:53)
	at TestMyClass$2.run(TestMyClass.java:25)
	at java.lang.Thread.run(Unknown Source)
Threads completed: [2]

Generally speaking, I have not found a nice way to assert fine detail inside a thread. The exceptions are cut off from the main thread and therefore never reach the runner. So the only way to look at these tests is to test the side effects after the test is complete or in the middle of the execution, if you can stop your threads reliably.

Standard
Applications

Touch Screen Tablet Browsing

Touch has become the the leading new way to control or online experience. We have it most new smart-phones, tablets and netbooks.

Most of those have purpose built software. But what if you are running a conventional PC with a touchscreen?

At first it seems you are bit stuck… with trying to catch that small scroll bar with your fat fingers. It’s obviously possible to have a good aim when concentrating… The question is do you really want to concentrate on the scroll bar when reading a news article?

Although poorly advertised the 2 browsers I use every day do support it. And those are of course Opera and Firefox. As a small side-step ramble, Chrome is nice, but Google is dominating my web experience already – that would be too much and IE is out of the question because it has been historically lagging on ALL of the sensible browser features.

Onto the technical bit. How would you enable the most basic, yet useful feature for dragging your screen with a mouse? Aiming at something PDF/PS viewers have been doing since their creation.

In Firefox you can use Grab and Drag add-on.

In Opera you should drag a new button Text Selection On from deep in the Appearance (Shift + F12) > Buttons > Browser View. Would have never found it if not for this post.

A there you go. Now you can stick your finger anywhere on the screen and drag it anywhere you like :)

Standard
Java/Maven

Accessing Java Private Attributes and Methods

Java reflection is a very powerful framework.

Sometimes its power seems outside the usual Java methodology.

For example, I needed to manually change a private variable in one of the classes I was testing.

Java reflection has an API that allows this. Of course, the API does not know if one is in a test or production source code. It works anywhere!

Below is an example of three classes.

  • IntWrapper is the class under question.
  • IntWrapperTest is the JUnit class
  • RegularMainClass is a class with a main method, almost a copy of IntWrapperTest.

It is worth noting that the testers and the testee are in separate packages. So one can change the private variable or call a private method of any class in any package!

package com.otherexample;

public class IntWrapper {

	private int i;
	
	public IntWrapper(int i) {
		this.i = i;
	}

	private void incrementI() {
		i++;
	}
	
	public int getI() {
		return i;
	}
	
}
package com.example;

import java.lang.reflect.Field;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;

import org.junit.Test;

import com.otherexample.IntWrapper;

import static org.junit.Assert.*;

public class IntWrapperTest {

	@Test
	public void testPrivateMembers() {
		try {
			IntWrapper iw = new IntWrapper( 15 );
			assertEquals(15, iw.getI() );
			Field f = iw.getClass().getDeclaredField("i");
		
			f.setAccessible(true);
			f.setInt(iw, 18);
			assertEquals(18, iw.getI() );
			
			Method m = iw.getClass().getDeclaredMethod("incrementI");
			m.setAccessible(true);
			m.invoke(iw);
			assertEquals(19, iw.getI() );
		} catch (SecurityException e) {
			e.printStackTrace();
		} catch (NoSuchFieldException e) {
			e.printStackTrace();
		} catch (IllegalArgumentException e) {
			e.printStackTrace();
		} catch (IllegalAccessException e) {
			e.printStackTrace();
		} catch (NoSuchMethodException e) {
			e.printStackTrace();
		} catch (InvocationTargetException e) {
			e.printStackTrace();
		}
	}
	
}
package com.example;

import java.lang.reflect.Field;
import java.lang.reflect.InvocationTargetException;
import java.lang.reflect.Method;

import com.otherexample.IntWrapper;


public class RegularMainClass {

	public static void main(String[] args) {
		try {
			IntWrapper iw = new IntWrapper( 15 );
			if( 15 != iw.getI() ) System.err.println("Assertion failed");
			else System.out.println("all is well 1");
			Field f = iw.getClass().getDeclaredField("i");
		
			f.setAccessible(true);
			f.setInt(iw, 18);
			if( 18 != iw.getI() ) System.err.println("Assertion failed");
			else System.out.println("all is well 2");
			
			Method m = iw.getClass().getDeclaredMethod("incrementI");
			m.setAccessible(true);
			m.invoke(iw);
			if( 19 != iw.getI() ) System.err.println("Assertion failed");
			else System.out.println("all is well 3");

		} catch (SecurityException e) {
			e.printStackTrace();
		} catch (NoSuchFieldException e) {
			e.printStackTrace();
		} catch (IllegalArgumentException e) {
			e.printStackTrace();
		} catch (IllegalAccessException e) {
			e.printStackTrace();
		} catch (NoSuchMethodException e) {
			e.printStackTrace();
		} catch (InvocationTargetException e) {
			e.printStackTrace();
		}
	}

}
Standard