Unit Testing

Injecting data into Obj-C readonly properties

<tl;dr>

If you want to inject data into an object that only has read-only properties, swizzle the synthesised getter function so you can inject the data you need at the point it’s accessed.

For example

// Interface we want to override
@interface SKPaymentTransaction : NSObject
     @property(nonatomic, readonly) NSData *transactionReceipt;
@end

//
// Returns an invalid receipt
//
NSData* swizzled_transactionReceipt(id self, SEL _cmd)
{
     return @“my receipt”;
}

//
// Test our receipt
//
- (void)test_InvalidTransactionSentViaAPI_VerificationFails
{
     // Create an invalid transaction
     SKPaymentTransaction* invalidTransaction = [[SKPaymentTransaction alloc] init];

     // Replace transactionReceipt with our own
     Method originalMethod = class_getInstanceMethod([invalidTransaction class], @selector(transactionReceipt));
     method_setImplementation(originalMethod, (IMP)swizzled_transactionReceipt);
}

</tl;dr>

 

I recently needed to set up some client side unit tests for our iOS receipt verification server. This server takes a payment object and verifies it with Apple to check if it’s actually a legal receipt and if it is, the content is awarded to the player. Server side, this is pretty simple, but it’s an important server step and should anything happen its possible for people to be locked out from getting their content.

So it’s important we have tests that send legal, invalid and corrupt data to the server and we need to test through the client API otherwise it’s not a test that can be 100% reliable.

Our verify API looks something like the following

+(BOOL) queueVerificationRequest:(NSString*)productId withTransaction:(SKPaymentTransaction*) transaction;

It takes an SKPaymentTransaction because thats the object the client deals with. We’re going to use internal data such as NSStrings or NSData, but it shouldn’t be the clients responsibility to query the SKPaymentTransaction to get the relevant information. Should the structure change, our API also breaks and that’s not acceptable.

So we’re stuck with the SKPaymentTransaction and its API looks something like this

@interface SKPaymentTransaction : NSObject
     @property(nonatomic, readonly) NSError *error;
     @property(nonatomic, readonly) SKPaymentTransaction *originalTransaction;
     @property(nonatomic, readonly) SKPayment *payment;
     @property(nonatomic, readonly) NSArray *downloads;
     @property(nonatomic, readonly) NSDate *transactionDate;
     @property(nonatomic, readonly) NSString *transactionIdentifier;
     @property(nonatomic, readonly) NSData *transactionReceipt;
     @property(nonatomic, readonly) SKPaymentTransactionState transactionState;
@end

Now in our case, we’re interested in using [SKPaymentTransaction transactionIdentifier] and [SKPaymentTransaction transactionReceipt], which both need to be managed inside our verification call so we’ll need SKPaymentTransaction objects that return different values depending on what we’re testing.

And since they’re readonly, we can’t just set them manually.

Initially, I tried to use class extensions, just to add the behaviour I was looking for, as you’ll often add internal extensions to present readonly properties externally but support readwrite properties internally.

@interface SKPaymentTransaction()
     @property(nonatomic, readwrite) NSString *transactionIdentifier;
     @property(nonatomic, readwrite) NSData *transactionReceipt;
@end

Compiles fine but at runtime it generates an unknown selector error. This is because the read only property has already be synthesised, and while the compiler now thinks it can see a setter, at runtime it’s not present.

So, my second attempt was to derive from SKPaymentTransaction and add the functionality there, passing through the base SKPaymentTransaction but using the derived type to set the value.

@interface SKPaymentTransactionDerived : SKPaymentTransaction
     @property(nonatomic, readwrite) NSString *transactionIdentifier;
     @property(nonatomic, readwrite) NSData *transactionReceipt;
@end

Fortunately, the compilers a bit smarter this time and warns me before I even start

error: auto property synthesis will not synthesize property 'transactionIdentifier' because it is 'readwrite' but it will be synthesized 'readonly' via another property [-Werror,-Wobjc-property-synthesis]

At this point I was a bit stuck until @pmjordan suggested I swizzled the transaction identifier and receipts to return the values I’m interested in testing, rather than setting the values directly.

But what needs to be swizzled when we’re attempting to override an Obj-C property defined as follows

@interface SKPaymentTransaction : NSObject
     @property(nonatomic, readonly) NSData *transactionReceipt;
@end

Properties are automatically synthesised (unless explicitly defined) so we’re actually looking at the following selectors

@interface SKPaymentTransaction : NSObject
     // Selector used to return the data
     -(NSData*)  transactionReceipt;

     // This would be defined if we’d specified the property as readwrite
     // -(void)     setTransactionReceipt:(NSData*)receipt;
@end

So at this point, our tests look like the following

//
// Returns an unique invalid receipt
//
NSData* replaced_getTransactionReceipt_Invalid(id self, SEL _cmd)
{
     static int runningId = 0;
     ++runningId;

     NSString* receiptString = [NSString stringWithFormat:@"replaced_getTransactionReceipt_Invalid %d", runningId];
     return [receiptString dataUsingEncoding:NSUTF8StringEncoding];
}

// Checks we fail through the API
- (void)test_InvalidTransactionSentViaAPI_VerificationFails
{
     //
     // Set up ...
     // 

     // Create an invalid transaction
     SKPaymentTransaction* invalidTransaction = [[SKPaymentTransaction alloc] init];
     
     // Inject our invalid receipt method instead of using the default [SKPaymentTransaction transactionReceipt]
     Method originalMethod = class_getInstanceMethod([invalidTransaction class], @selector(transactionReceipt));
     method_setImplementation(originalMethod, (IMP)replaced_getTransactionReceipt_Invalid);

     // Test our verification
     [HLSReceipts queueVerificationRequest:@"test_InvalidTransactionSentViaAPI_VerificationFails" withTransaction:invalidTransaction];

     //
     // Test validation ...
     //
}

As a result of this call, when the library calls [SKPaymentTransaction transactionReceipt] it will instead call replaced_getTransactionReceipt_Invalid which gives us the ability to pass through any receipt we want, including real, invalid and corrupt ones as our tests dictate.

It’s worth noting here that I’m using method_setImplementation rather than the usual method_exchangeImplementations which I’ll explain in a following post soon.

Accu 2014 Conference Notes

I had the chance to go to ACCU 2014 the other week (full conference schedule is here) and I have to say it was one of the best conferences I’ve had the pleasure to attend. And while it did confirm my idea that C++ is getting the point of saturation and ridiculous excess (C++11 was needed, as a result so was C++14, but C++17… Just use a more suitable language if these are the features you need), the talks I went to, on the whole, we’re great.

So I thought I may as well post up the notes I made from each talk – and while they might be a bit of a brain dump of the time, if there’s something that sparks your interest, I’m sure the full presentations will be posted up at some point soon.

 

Get Archaeology
Charles Bailey, Bloomberg LP

Interesting talk looking at some of the more esoteric ways of presenting and searching for information within an existing Git repository. Unfortunately, time was short so the speaker had to rush through the last part, which was, for me, the most relevant and interesting part.

Talk notes

 

Designing C++ Headers Best Practices
Alan Griffiths

For most experienced C++ developers the content of this talk is probably quite basic, as you’d have a large portion of this covered already through sheer trial and error over the years. But clarifying some good practices and interesting side effects was enjoyable.

Talk notes

 

Version Control – Patterns and Practices
Chris Oldwood, chrisoldwood.blogspot.com

Highlevel overview of the various patterns we use when using version control (focused mostly on DVC), some useful examples and some interesting discussion about trust issues…

Talk notes

 

Performance Choices
Dietmar Kuhl, Bloomberg LP

Proof that if you want to know about optimisation and performance, stick to asking people in the games industry.

I’m not posting these notes.

 

Crafting More Effective Technical Presentation
Dirk Haun, http://www.themobilepresenter.com

Really good presentation on how to craft good presentations – some interesting discussions on the make up of the human brain, why certain techniques work and why the vast majority of technical talks (or just talks in general to be honest) do what they do.

Talk notes

 

The Evolution of Good Code
Arjan van Leeuwen, Opera Software

Great talk, not telling us what good code is, but examining a few in-vougue books over the last decade to see where they sit on various contentious topics. Note that when the notes say “no-one argued for/against” it’s just referencing the books being discussed!

Talk notes

 

Software Quality Dashboard for Agile Teams
Alexander Bogush

Brilliant talk about the various metrics Alexander uses to measure the quality of their code base. If you’re sick of agile, don’t let the title put you off, this is valuable reading regardless of your development methodology.

Talk notes

 

Automated Test Hell (or There and Back Again)
Wojciech Seiga, JIRA product manager

Another great talk, this time discussing how the JIRA team took a (very) legacy project with a (very) large and fragile test structure into something much more suitable to quick iteration and testing. When someone says some of their tests took 8 days to complete, you have to wonder how they didn’t just throw it all away!

Talk notes

 

Why Agile Doesn’t Scale – and what you can do about it
Dan North, http://dannorth.net

Interesting talk, arguing that Agile is simply not designed to, nor was it ever imaged to, be a scalable development methodology (where scale is defined as bigger teams and wider problems). Excellently covered why and how agile adoption fails and how this can be avoided to the point where agile principles can be used on much larger and less flexible teams.

Talk notes

 

Biggest Mistakes in C++11
Nicolai M Josuttis, IT-communications.com

Entertaining talk where Nicolai, a member of the C++ Standard Committee library working group, covers various features of C++11 that simply didn’t work or shouldn’t have been included in the standard. When it gets to the point that Standard Committee members in the audience are arguing about how something should or does work, you know they’ve taken it to far.

Talk notes

 

Everything You Ever Wanted to Know About Move Semantics (and then some…)
Howard Hinnant, Ripple Labs

Detailed talk on move semantics which are never as complicated as they (and even the speaker was) are made out to seem. There’s some dumb issues, which seem to be increasing as the C++ standards committee don’t seem to understand the scale of the changes being made, but never-the-less it was a good overview of what is a key improvement to C++.

Talk notes

‘Unit Testing’ Complex Systems

I was e-mailed by a co-worker the other week asking for advice on how to test a new feature that was implemented solely using private member functions and inaccessible objects within an existing system. It’s an interesting question that people often struggle with when trying to ‘unit test’ complex or high level systems when they are used to testing low level systems or classes where everything is accessible.

I use quotes around ‘unit testing’ because I think at times using the phrase unit testing becomes unhelpful when talking about system testing. I don’t know if it’s just the books I’ve read or the outlook of people I’ve discussed this with, but on the whole when someone says “I need to write a unit test to…” they almost always mean “I’ve just written something and now I must test the individual functions to make sure they all work”.

When testing a high level system, especially one that is being tested after the implementation is finished, people can struggle to take the processes they used to test lower level systems and apply them to more complex, interdependent objects.

Take the following unit test which checks that objects are being correctly destructed when a vector is cleared.

TEST(DestructorCalledOnClear)
{
  ftl::vector theVector(10);  // Reserve more than we’ll push
  TestClass tempObject;

  TestClass::m_destructorCall = 0;
  
  theVector.push_back(tempObject);
  theVector.push_back(tempObject);
  theVector.push_back(tempObject);

  theVector.clear();

  CHECK_EQUAL(3, TestClass::m_destructorCall);
}

This is a pretty standard looking unit test. We have some input and we’re testing the behaviour of the function we’re calling and it’s affect on the data passed to it. At this point we’re working at a pretty low level, so we have the opportunity to test components almost independently. Sort of.

These functions might be low level, but they are still built up of individual (and often inaccessible) components. All these functions are calling other functions we don’t have access to and we’re obviously not testing the individual lines of code, which will be generating multiple lines of ASM. We can accept this because we’re both used to it and we simply have to. We don’t try to test the internal implementation because we know it works based on the input and generated output of the tested functions.

We have other tests which check the individual elements of this test (push_back() adding elements and clear() reducing it’s size to 0), and basing our more complex test on these allows us to concentrate on the more complex behaviour (the destruction of objects inside the vector). It’s this mindset of testing independent elements and basing more complex tests on simpler tests that needs to be expanded as the systems become more complex and more inaccessible.

Take the following use case as an example.

A title can only communicate with a the game save system using messages. It has no access to the components of the game save system at all, and can only request processes (like asking for data to be saved) and wait for a result message (like the results of a save operation). The system has the following behaviour which we want to test – when a user sends some data to be saved, if the data is the same as what was previously saved the save doesn’t take place at all.

Implementation wise, this is done completely through the use of private functions so nothing other than the internal object itself has access to the data and how it behaves. So what’s the approach here, because we certainly don’t have the opportunity to run what people would call ‘standard unit tests’ on this system as it stands.

The first question to ask is if the system itself is structured in the best possible way. For example, we have an object which is both carrying out the logic of saving but also the management of the data itself. As a result the data management aspect is hidden deep inside the system, making it both impossible to access and impossible to test but knowing the data is managed correctly is crucial to testing the whole system.

So the first approach would be to extract the data management element into it’s own structure. Having an independent object responsible for caching the data and performing any operations on the it (in this case checking if the new data is the same as the old) not only improves maintainability (under the single responsibility principle), it also allows us to test the specific data management step of any save process independently before plugging it into the system as a whole.

Now we’ve extracted what should be an independent element of the system (and directly tested the same data checks process), we can look at the rest of the system. We already have tests for the low level save API, so all we now need to test the user side behaviour.

We know the following behaviour should exist for the user
* When a user requests a save, the data, if different, is saved
* If the data is the same the system reports ‘success’ at the point the save was requested
* If the data is saved the system reports success after the save has completed

So based on this we can now test the system as a whole, knowing that we’ve extracted the independent systems into their own lower level unit tests with these tests concentrating on the high level behaviour of the save process as a whole.
* Requesting a save doesn’t return a success until the data has been saved. We can tell this via the content of the result message and the time it takes to arrive
* When requesting another save with the same data, the result should be instantaneous – also indicating success – which will only happen if saving doesn’t occur
* When following that with changed data, the save successfully completed and it doesn’t return instantaneously

Because we’ve extracted the data management structure into its own test, we can safely assume that element of the system works and it’s the high level, end user experience, that we’re testing in exactly same way in which a user would use it.

#define protected public
Opening up the system is often suggested as a way of testing the ‘internals’ of an object but the more testing deviates from how people use a system the less the tests can be relied on in a real life situation. It’s all good and well a test passing independently of everything else, but if it falls over when used by a client then testing has failed.

Changing protected or private to public (with a method of your choosing) only increases the difference between test usage or real-world usage and quite often internal structures or algorithms may well not work independently, leaving you to replicate what the object would be doing anyway which can easily lead to over-mocking your tests.

So it opens up the question of how much your system is doing and how much of that work could be extracted into more independent objects that are both easier to test and (more importantly) easier to maintain. In these cases making the system more modular or changing how the data is handled will improve both testing and code use without compromising encapsulation and use case testing.

Testing Objects with the Same Behaviour

One of the most difficult things when it comes to unit testing is cutting down on the number of duplicated tests. In the past I’ve often developed objects which have very different implementations but externally produce very similar results (for example when developing a vector compared to a fixed vector, or implementing different types of iterators), and one thing I don’t want to do is have a set of tests which are almost identical to another set of tests I wrote last week.

One method I’ve used is to write a common set of tests independent of a specific type and had the type defined in another file, along with other flags which are used to indicate which tests should or shouldn’t be run.

So for example, when developing a ring buffer style iterator, it was useful to know that the const and non-const versions produced the same results in a lot of different situations without having to have comparison tests or duplicating a large amount of test code.

The following examples are written using UnitTest++ but will work with a number of different unit testing Frameworks

// In the file RingBufferIterator_Tests.inl

// Tests are defined in a separate file assuming certain
// settings have been defined
TEST(Blah)
{
RINGBUFFER_ITERATOR_TYPE myItor;

//...  Doing tests with this type
}

Now we simply need to define the type of iterator and indicate if some tests should or shouldn’t be run

// In the file RingBufferIterator_TestTypes.cpp

SUITE(RingBufferItor)
{
// Define the type of iterator we want to test
#define RINGBUFFER_ITERATOR_TYPE ftl::itor::ringbuffer

// Define a couple of settings which the tests file uses to make
// sure some type specific tests are run or excluded

// This one simply indicates that the tests checking the ability
// to alter the content of the iterator will be run
#define RINGERBUFFER_ITERATOR_NONCONST_CONTENT

// Now we simply need to include the file used to
// implement all the unit tests for this kind of iterator
#include "RingBufferIterator_Tests.inl"
}

We do the same for the const version of the iterator we are testing

// In the file RingBufferConstIterator_TestTypes.cpp

SUITE(RingBufferConstItor)
{
// Define the type of iterator we want to test
#define RINGBUFFER_ITERATOR_TYPE ftl::itor::ringbuffer_const

// In this case we don't have any additional settings so we
// simply include the test files and let the tests run
#include "RingBufferIterator_Tests.inl"
}

You can take this as far as you want but there is obviously going to be a point where the number of flags you’re defining starts to make the tests hard to read or unmanageable. Limiting it to about two, where you can easily group the tests within these defines generally works quite well.

You don’t need to limit it to a single file. For example, some types might be similar enough to share 50% of the tests but not the rest. Splitting up the common tests is easy enough.

// In the file VectorContainer_TestTypes.cpp

SUITE(Vector)
{
// Define our type
#define VECTOR_TYPE ftl::vector

// Include our common tests
#include "Vector_CommonTests.inl"

// Include only the tests for this type
#include "Vector_DynamicTests.inl"
}

When tests are not written like this there is a a big gap in the ability to test if comparable types behave the same (fixed and non-fixed containers for example) especially when simple things like human error can get in the way of creating duplicate behaviour tests. By changing over to this style of testing we can automatically test the behaviour of similar types without creating additional work.

It does have it’s draw backs, the main one being that you can get multiple test failures (if a test is used by >1 type and they both fail at the same time) or worse you get one fail and you’re not sure which type caused the problem. An easy solution to this would be to allow the failure message to have a option component, allowing you to identify the type in the message, but this isn’t supported by any test framework that I know of.

Occasionally my compilers dependancy checker doesn’t cope very well, compiling only one of the type files rather than all of them if a test changes, but this has been remarkably rare and these draw backs don’t overshadow the ability to confirm that your types are functionally the same even if their implementation is vastly different.

Unit Testing an Object with Dependent Behaviour

It would be a good idea to start with a bit of history. In previous posts I’ve talked about our custom STL implementation which contains a basic vector, but also a fixed::vector which takes a fixed size and allocates all memory within that container (usually on the stack) and cannot increase or decrease in size. This comes with a lot of benefits and provides game teams with a more consistent memory model when using containers.

But it comes with a lot of duplication. Those two containers are classes in their own right, each with over 100 tests (written with UnitTest++) which is a lot to maintain and (more importantly) keep consistent. A change to one usually means a change to the other and additional tests and checks need to be rolled out. I wasn’t happy about this, so moved all the vector code into a base class and the fixed and normal vectors are now types based on that, with a different allocator being the only difference (I didn’t want to go for inheritance when a templated structure could do all the work).

That was the easy bit. But now, how do you unit test an object (in this case the vector) who’s behaviour greatly depends on an internal object (in this case the different allocator models). The allocators have been unit tested, so we know their behaviour is correct, but the behaviour of the vector will depend on the allocator it is using (in some cases it will allocate more memory – so it’s capacity varies – in others it won’t allocate memory at all – so it’s ability to grow is different). And this behaviour may be obvious or it may be much more subtle.

And it’s this which causes the problem. The behaviour should be more or less the same, but in some cases, the behaviour will be different. push_back/insert may fail, reserve may reserve more memory than has been requested and construction behaviour will greatly depend on the allocator being used.

So to start – the simple and easy option. Create all the unit tests for one type and then duplicate them for each different allocator we come up against. But we went down this road to reduce the amount of duplication so there is still going to be a large number of tests that are the same and need to be maintained. In fact you could say it would be harder to maintain because the behaviour in some tests would be suitably different rather than clearly the same.

So I started to look for a more intelligent approach to this.

The first thing was go back to the basics – what was unit testing for? Such basics are often lost in the grand scheme of things, but in this case the point of the tests was to make sure the behaviour was correct, or more importantly, what was expected. There are obviously other aspects to unit tests (which are outside the scope of this post) but this is my primary concern.

When thought about it from this level the conflicting behaviour becomes quite clear – or at least it should. We can split the various methods into fixed and dependant calls, ones which will be the same regardless of the allocator and those which depend on what is being used. If we cannot split the behaviour then we have a problem with either our interface or our implementation. Luckily the allocation code is quite contained, but there were little tendrils of code that assumed things about the allocation of memory that it shouldn’t. Not a problem as the tests will easily pick these out – I hope.

So I may as well start with the easy stuff to test, the stuff that I now know is common across a vector regardless of the allocator. Iteration, removal (since the vector doesn’t de-allocate on removal), access, equality checks (vectors are equal based on their contents) – in effect anything that reads, accesses or removes content, nothing else. And since I have tests for these already this is quickly done.

This is testing against the vectors behaviour as I know this behaviour is constant and will never change.

The hard part is now a little bit easier. Looking at what I have left I can easily see what I need to test. Since the changes in behaviour are based on nothing other than the allocator, we can easily extract that and create a couple of mocks specifying specific allocator behaviour. One which allocates the memory as you request, one which allocates a different amount of memory and one which returns no memory at all.

But now I’m not testing the behaviour of the vector, I’m testing the implementation of the vector – knowing how it works internally and how that affects its external state.

At first I tried to be a bit clever with this. Each test was extracted into a small templated helper function, with the checks being done in those. That way I only had one implementation of a test and simply passed in the vector, source data and expected values. I had to make a few tweaks to the UT++ source to enable checks to be done outside the TEST(…) call, but this was pretty simple.

At first this worked well. As long as the set of tests were the same across all objects using our various mocks it was fine. The same number of expected results, the same or similar  input values and the same general behaviour. But it started to get tricky when the tests needed to deviate. For example, when testing a vector that uses a fixed size allocator, I need to test that the vector works fine on the boundaries, boundaries which the other vectors I’m testing don’t have. So we either end up with a large number of superficial tests, or much worse, I forget the extract the specific test cases for the specific method of implementation.

But since there is nothing wrong with a bit of duplication, especially when each ‘duplicated’ test is testing a specific thing I don’t need to be so clever. By having very discrete tests based on the implementation of the vector means I am free to deviate the test suite as I see fit, when I know that the behaviour of the vector will be different based on the type of allocator it is using. It would be beautiful to be able to use the same tests to test each type, but frankly that’s just not possible when you have to understand the internal behaviour of the object to test it correctly.

So the final solution – duplicate some tests but only those that you know have different behaviour based on an implementation or an expected outcome. There might be a bit more code but it produces the best results, the best output (it’s much easier to see what has failed and why) and the easiest to understand.

If the object depended on multiple internal objects (a policy based smart pointer is an excellent example) then I don’t believe this solution would work. The more variations you have, the blurrier the lines between fixed and dependant behaviour become. In those cases I probably would expect to be writing many more tests based on the final object, simply for piece of mind.

You can only go so far with this, and unit tests are designed to aid you and give you a security blanket. In an ideal world (one where TDD rules all) you don’t need to know what the internal implementation is doing, only the final outcome. But if you are spending more time figuring out your tests (let along writing them) then you need to take a step back and reassess what you are doing.

Title image by fisserman.