Filterer Pattern in 10 Steps (10 min read)

illustration: a blue funnel with text 10 and an icon of steps (source: Pixabay)
0

Filterer is a pattern that should be applied only in certain cases. In the original post, I presented a very simple example intended to show how to apply it. In this post, I present a much more detailed example that’s intended to also explain when and why to apply it.

Introduction

The post consists of the following 10 short steps. In each step, I introduce requirements of the following two types:

  • B-*: business requirements (given by the product owner → indisputable)
  • S-*: solution requirements (resulting from the choice of solutions → disputable)

and I present a Java model meeting the requirements introduced so far. I do this until Filterer emerges as the preferable solution.

So, let me take you upon this journey…

Step 1: Issue Detector

Requirements #1

Let’s assume business asks for an algorithm to detect grammatical and spelling issues in English texts.

For example:

  • text: You migth know it. → issues to detect:
    1. migth (type: spelling)
  • text: I have noting to loose. → issues to detect:
    1. noting (type: spelling)
    2. to loose (type: grammar)
  • text: I kept noting it’s loose. → issues to detect: ∅

This is out first business requirement (B-1).


The simplest model meeting B-1 could be:

  • input: plain text
  • output: a list of issues, where each issue provides:
    • offsets within the input text
    • a type (grammar / spelling)

This is our first solution requirement (S-1).

Java Model #1

We can model S-1 as:

interface IssueDetector {
  // e.g. text: "You migth know it."
  List<Issue> detect(String text);
}

where:

interface Issue {
  int startOffset(); // e.g. 4 (start of "migth")
  int endOffset(); // e.g. 9 (end of "migth")
  IssueType type(); // e.g. SPELLING
}
enum IssueType { GRAMMAR, SPELLING }

It’s commit 1.

Step 2: Probability

Requirements #2

However, it’d be rather hard to implement a real IssueDetector that worked in such a deterministic way:

  • issue (probability P=100%)
  • non-issue (probability P=0%)

Instead, IssueDetector should rather be probabilistic:

  • probable issue (probability P=?)

We can keep the issue/non-issue distinction by introducing a probability threshold (PT):

  • issue (probability P ≥ PT),
  • non-issue (probability P < PT).

Still, it’s worth to adapt the model to keep the probability (P) — it’s useful e.g. in rendering (higher probability → more prominent rendering).

To sum up, our extra solution requirements are:

  • S-2: Support issue probability (P);
  • S-3: Support probability threshold (PT).

Java Model #2

We can meet S-2 by adding probability() to Issue:

interface Issue {
  // ...
  double probability();
}

We can meet S-3 by adding probabilityThreshold to IssueDetector:

interface IssueDetector {
  List<Issue> detect(String text, double probabilityThreshold);
}

It’s commit 2.

Step 3: Probable Issue

Requirements #3

Assume business requires:

  • B-3: Test all issue detectors using texts proofread by an English linguist (= no probabilities).

Such a proofread text (or: a test case) can be defined as:

  • text, e.g. You shuold know it.
  • expected issues, e.g.
    1. shuold (type: spelling)

So, our solution requirement is:

  • S-4: Support expected issues (= no probability).

Java Model #3

We can meet S-4 by extracting a subinterface (ProbableIssue):

interface ProbableIssue extends Issue {
  double probability();
}

and by returning ProbableIssues from IssueDetector:

interface IssueDetector {
  List<ProbableIssue> detect(...);
}

It’s commit 3.

Step 4: Issue-wise Text

Requirements #4

Assume that:

  1. All test cases are defined externally (e.g. in XML files);
  2. We want to create a parametrized JUnit test where parameters are test cases provided as a Stream.

Generally, a test case represents something we could call an issue-wise text (a text + its issues).

In order to avoid modeling issue-wise text as Map.Entry<String, List<Issue>> (which is vague, and signifies insufficient abstraction), let’s introduce another solution requirement:

  • S-5: Support issue-wise texts.

Java Model #4

We can model S-5 as:

interface IssueWiseText {
  String text(); // e.g. "You migth know it."
  List<Issue> issues(); // e.g. ["migth"]
}

This lets us define a Stream of test cases simply as

  • Stream<IssueWiseText>

instead of

  • Stream<Map.Entry<String, List<Issue>>>.

It’s commit 4.

Step 5: Expected Coverage

Requirements #5

Assume business requires:

  • B-4: Report expected issue coverage for a stream of test cases;

where issue coverage — for the sake of simplicity — is defined as:

total issue length
─────────────
total text length

In reality, issue coverage could represent some very complex business logic.

Java Model #5

We can handle B-4 with a Collector-based method:

static double issueCoverage(Stream<? extends IssueWiseText> textStream) {
  return textStream.collect(IssueCoverage.collector());
}

The Collector is based on an Accumulator having two mutable fields:

int totalIssueLength = 0;
int totalTextLength = 0;

which, for each IssueWiseText, we increment:

totalIssueLength += issueWiseText.issues().stream().mapToInt(Issue::length).sum();
totalTextLength += issueWiseText.text().length();

and then we calculate issue coverage as:

(double) totalIssueLength / totalTextLength

It’s commit 5.

Step 6: Obtained Coverage

Requirements #6

Assume business requires:

  • B-5: Report obtained issue coverage for the entire test set.

where “obtained” means “calculated using detected issues”. Now things start to get interesting!

First of all, since IssueCoverage represents business logic, we shouldn’t duplicate it:

  • S-6: Reuse issue coverage code.

Secondly, since the method takes a Stream<? extends IssueWiseText>, we need to model an IssueWiseText for ProbableIssues:

  • S-7: Support probabilistic issue-wise texts.

I see only two choices here:

  1. Parametrization: IssueWiseText<I extends Issue>;
  2. Subtyping: ProbabilisticIssueWiseText extends IssueWiseText.

Parametric Java Model #6

The parametric model of S-7 is simple — we need <I extends Issue> (a bounded type parameter) in IssueWiseText:

interface IssueWiseText<I extends Issue> {
  String text();
  List<I> issues();
}

This model has drawbacks (like type erasure), but it’s concise.

We can also adapt IssueDetector to return IssueWiseText<ProbableIssue>.

What’s more, our Stream of test cases may turn into Stream<IssueWiseText<Issue>> (although IssueWiseText<Issue> is somewhat controversial).

It’s commit 6a.

Subtyping Java Model #6

The other option is to choose subtyping (which has its own drawbacks, greatest of which can perhaps be duplication).

A subtyping model of S-7 employs return type covariance:

interface ProbabilisticIssueWiseText extends IssueWiseText {
  @Override
  List<? extends ProbableIssue> issues();
}

where issues() in IssueWiseText has to become upper bounded (List<? extends Issue>).

We can also adapt IssueDetector to return ProbabilisticIssueWiseText.

It’s commit 6b.

Step 7: Filtering by Issue Type

Requirements #7

Assume business requires:

  • B-6: Report issue coverage per issue type.

We could support it by accepting an extra parameter of type Predicate<? super Issue> (IssueType parameter would be too narrow, in general).

However, supporting it directly in IssueCoverage would complicate business logic (commit 7a’). Instead, we’d rather feed the filtered instances of IssueWiseText to IssueCoverage.

How do we do the filtering? Doing it “manually” (calling new ourselves) would introduce unnecessary coupling to the implementations (we don’t even know them yet). That’s why we’ll let IssueWiseText do the filtering (I feel this logic belongs there):

  • S-8: Support filtering by Issue in IssueWiseText.

In other words, we want to be able to say:

Hey IssueWiseText, filter yourself by Issue!

Parametric Java Model #7

In the parametric model, we add the following filtered method to IssueWiseText<I>:

IssueWiseText<I> filtered(Predicate<? super I> issueFilter);

This lets us meet B-6 as:

return textStream
        .map(text -> text.filtered(issue -> issue.type() == issueType))
        .collect(IssueCoverage.collector());

It’s commit 7a.

Subtyping Java Model #7

In the subtyping model, we also add filtered method (very similar to the one above):

IssueWiseText filtered(Predicate<? super Issue> issueFilter);

This lets us meet B-6 in the same way as above.

It’s commit 7b.

Step 8: Filtering by Probability

Requirements #8

Assume business requires:

  • B-7: Report issue coverage per minimum probability.

In other words, business wants to know how the probability distribution affects issue coverage.

Now, we don’t want to run IssueDetector with many different probability thresholds (PT), because it’d be very inefficient. Instead, we’ll run it just once (with PT=0), and then keep discarding issues with the lowest probability to recalculate issue coverage.

Yet, in order to be able to filter by probabilities, we need to:

  • S-9: Support filtering by ProbableIssue in probabilistic issue-wise text.

Parametric Java Model #8

In the parametric model, we don’t need to change anything. We can meet B-7 as:

return textStream
        .map(text -> text.filtered(issue -> issue.probability() >= minProbability))
        .collect(IssueCoverage.collector());

It’s commit 8a.

Subtyping Java Model #8

In the subtyping model, it’s harder, because we need an extra method in ProbabilisticIssueWiseText:

ProbabilisticIssueWiseText filteredProbabilistic(Predicate<? super ProbableIssue> issueFilter);

which lets us meet B-7 as:

return textStream
        .map(text -> text.filteredProbabilistic(issue -> issue.probability() >= minProbability))
        .collect(IssueCoverage.collector());

It’s commit 8b.


To me, this extra method in ProbabilisticIssueWiseText is quite disturbing, though (see here). That’s why I propose…

Step 9: Filterer

Requirements #9

Since regular filtering in the subtyping model is so “non-uniform”, let’s make it uniform:

  • S-10: Support uniform filtering in the subtyping model of issue-wise text.

In other words, we want to be able to say:

Hey ProbabilisticIssueWiseText, filter yourself by ProbableIssue (but in the same way as IssueWiseText filters itself by Issue)!

To the best of my knowledge, this can be achieved only with the Filterer Pattern.

Subtyping Java Model #9

So we apply a generic Filterer to IssueWiseText:

Filterer<? extends IssueWiseText, ? extends Issue> filtered();

and to ProbablisticIssueWiseText:

@Override
Filterer<? extends ProbabilisticIssueWiseText, ? extends ProbableIssue> filtered();

Now, we can filter uniformly by calling:

text.filtered().by(issue -> ...)

It’s commit 9.

Step 10: Detection Time

By this time, you must wonder why I bother with the subtyping model if the parametric one is so much easier.

So, for the last time, let’s assume that business requires:

  • B-8: Report detection time (= time it takes to detect all issues in a given text).

Parametric Java Model #10

I see only two ways of incorporating B-8 into the parametric model: 1) composition, 2) subtyping.

Composition for Parametric Java Model #10

Applying composition is easy. We introduce IssueDetectionResult:

interface IssueDetectionResult {
  IssueWiseText<ProbableIssue> probabilisticIssueWiseText();
  Duration detectionTime();
}

and modify IssueDetector to return it.

It’s commit 10a.

Subtyping for Parametric Java Model #10

Applying subtyping requires a bit more work. We need to add ProbabilisticIssueWiseText<I>*:

interface ProbabilisticIssueWiseText<I extends ProbableIssue> extends IssueWiseText<I> {
  Duration detectionTime();
  // ...
}

and modify IssueDetector to return ProbabilisticIssueWiseText<?>.

It’s commit 10a’.

* Note that I left <I> on ProbabilisticIssueWiseText in order not to correlate parametrization with subtyping in a dangerous way.

Subtyping Java Model #10

With the purely subtyping model, incorporating B-8 is very easy. We just add detectionTime() to ProbabilisticIssueAwareText:

interface ProbabilisticIssueWiseText extends IssueWiseText {
  Duration detectionTime();
  // ...
}

It’s commit 10b.

Conclusions

There’s no time left to go into details (the post is already way longer than I expected).

However, I prefer pure subtyping (and hence Filterer) over other solutions because:

  1. Parametrization with composition leaves me without a common supertype (in certain cases, it’s a problem);
  2. Parametrization with subtyping has too many degrees of freedom.

By “too many degrees of freedom”, I mean I only need:

  • IssueAwareText<?>
  • ProbabilisticIssueAwareText<?>
  • IssueAwareText<Issue> (controversial)

but in code, I’ll also encounter (saying from experience!):

  • IssueAwareText<? extends Issue> (redundant upper bound)
  • IssueAwareText<ProbableIssue>
  • IssueAwareText<? extends ProbableIssue> (why not ProbabilisticIssueAwareText<?>?)
  • ProbabilisticIssueAwareText<? extends ProbableIssue> (redundant upper bound)
  • ProbabilisticIssueAwareText<ProbableIssue>

so it’s just too confusing for me. But if you’re really interested in this topic, check out Complex Subtyping vs. Parametrization (be warned, though — it’s even longer than this post!).

Thank you for reading!

0

Leave a comment

Your email address will not be published. Required fields are marked *

One thought on “Filterer Pattern in 10 Steps”