Picking the right abstraction

Recently I had to adapt some older Java code to support a new requirement: an existing CSV report needed to include a user’s email address, translated from the user id. Pretty simple, but how does the CSV report generator translate the user id to an email address?

The obvious implementation is to simply pass the UserDao directly to the report class¹:

package com.example.fizzbuzz.reports;

import com.example.fizzbuzz.dao.UserDao;
import com.example.fizzbuzz.domain.User;

public class FizzBuzzCsvReport extends CsvReport {

    private final UserDao userDao;

    public FizzBuzzCsvReport(UserDao userDao) {
        this.userDao = userDao;
    }

    public String[] headers() {
        return new String[] { "Fizzy", "Buzzy", "Email" };
    }

    public String[] row(FizzBuzz data) {
        User user = userDao.findById(data.getUserId());
        return new String[] {
            data.getFizz(),
            data.getBuzz(),
            user.getEmail()
        };
    }
}

The UserDao is injected into the constructor of the FizzBuzzCsvReport and used in the row method to translate the user id into an email address. Simple and probably what most Java code would look like. Unfortunately, this is not the right solution². Let’s write a unit test:

package com.example.fizzbuzz.reports;

import static org.junit.Assert.*;
import static org.mockito.Mockito.*;
import org.junit.Test;

import com.example.fizzbuzz.dao.UserDao;
import com.example.fizzbuzz.domain.Role;
import com.example.fizzbuzz.domain.User;

public class FizzBuzzCsvReportTest {
    @Test
    public void test_row_generation() {
        UserDao userDao = mock(UserDao.class);
        when(userDao.findById("userid")).thenReturn(
            new User("userid", "user name", "user@example.com", new Role[] { Role.ADMINSTRATOR }));
        FizzBuzzCsvReport subject = new FizzBuzzCsvReport(userDao);

        String[] result = subject.row(new FizzBuzz("fizz", "buzz", "userid"));

        assertArrayEquals(
            new String[] { "fizz", "buzz", "user@example.com" },
            result);
    }
}

That’s quite a bit of overhead just to perform a simple check! Why is it so hard to write the test? We’re even programming to an interface, not an implementation!

The main problem is not that UserDoa is not an abstraction (it is), but that it is the wrong abstraction for this usage. UserDao abstracts over how users are stored³, and by passing in the UserDao to the report we unnecessarily couple the report to the details of how users are represented and managed within the rest of our system. Note that using a dynamic language doesn’t really help either. The coupling would be less (no need to agree on the exact type) but the report would still require an object that responds to the findById message with an object that responds to the getEmail message.

So what would be the right abstraction? Let’s go back to the new requirement: “given the user id, add the user’s email address”. What’s the simplest abstraction that could work here? Let’s just use a function:

package com.example.fizzbuzz.reports;

import com.google.common.base.Function;

public class FizzBuzzCsvReport extends CsvReport {

    private final Function<String, String> userIdToEmail;

    public FizzBuzzCsvReport(Function<String, String> userIdToEmail) {
        this.userIdToEmail = userIdToEmail;
    }

    public String[] headers() {
        return new String[] { "Fizzy", "Buzzy", "Email" };
    }

    public String[] row(FizzBuzz data) {
        return new String[] {
            data.getFizz(),
            data.getBuzz(),
            userIdToEmail.apply(data.getUserId())
        };
    }
}

Now the report is fully decoupled from our application’s user infrastructure and is much easier to test:

package com.example.fizzbuzz.reports;

import static org.junit.Assert.*;
import java.util.Collections;
import org.junit.Test;
import com.google.common.base.Functions;

public class FizzBuzzCsvReportTest {
    @Test
    public void test_row_generation() {
        FizzBuzzCsvReport subject = new FizzBuzzCsvReport(Functions.forMap(
            Collections.singletonMap("userid", "user@example.com")));

        String[] result = subject.row(new FizzBuzz("fizz", "buzz", "userid"));

        assertArrayEquals(
            new String[] { "fizz", "buzz", "user@example.com" },
            result);
    }
}

We use the handy Functions.forMap method to create a function from a map⁴ and use this in our test. Compared to the previous version, setup boilerplate has been reduced by 57.1%⁵.

Obviously, the actual production code (for example, the controller that let’s the user download the CSV) will still have to adapt the UserDao to the Function interface to make use of the FizzBuzzCsvReport. This is straightforward and only needs to be defined once. With Java 8’s upcoming lambda support this will be even easier. To summarize:

Prefer simple, well-understood abstractions over home-grown variants (in this case a Function<String, String> versus a UserDao).
By defining the FizzBuzzCsvReport in terms of a Function<String, String> userIdToEmail we make clear what the report needs and also limits what it can do (principle of least privilege). With the UserDao approach we wouldn’t know what exactly the report is using that DAO for, it could even be deleting users!
Using an abstraction like Function gives you a huge library of pre-defined tools: adapting maps as functions, using memoization or caching, function composition, etc. Compare this to having to write your own caching adapter for a UserDao!
Function is just the start. There are many others that are simple and widely applicable.

Code simplified for explanatory reasons. ↩
There are many other possible solutions for this example¹. ↩
Abstracting over how users are stored was very useful when we replaced LDAP with a web based user management system. ↩
In many languages collections are automatically functions. Scala’s Map and other collections already extend scala.Function1, so you can pass a Map whenever a function is expected. Ruby 1.9’s Array, Hash, Proc (Ruby’s function class), and String classes all respond to the [] method, etc. ↩
98% of all statistics are made up. ↩