Accumulative is an interface proposed for the intermediate accumulation type A of Collector<T, A, R> in order to make defining custom Java Collectors easier.
Introduction
If you’ve ever used Java Streams, you most likely used some Collectors, e.g.:
But have you ever used…
- A composed
Collector?- It takes another
Collectoras a parameter, e.g.:Collectors.collectingAndThen.
- It takes another
- A custom
Collector?- Its functions are specified explicitly in
Collector.of.
- Its functions are specified explicitly in
This post is about custom Collectors.
Collector
Let’s recall the essence of the Collector contract (comments mine) :
/**
* @param <T> (input) element type
* @param <A> (intermediate) mutable accumulation type (container)
* @param <R> (output) result type
*/
public interface Collector<T, A, R> {
Supplier<A> supplier(); // create a container
BiConsumer<A, T> accumulator(); // add to the container
BinaryOperator<A> combiner(); // combine two containers
Function<A, R> finisher(); // get the final result from the container
Set<Characteristics> characteristics(); // irrelevant here
}
The above contract is functional in nature, and that’s very good! This lets us create Collectors using arbitrary accumulation types (A), e.g.:
A:StringBuilder(Collectors.joining)A:OptionalBox(Collectors.reducing)A:long[](Collectors.averagingLong)
Proposal
Before I provide any rationale, I’ll present the proposal, because it’s brief. Full source code of this proposal is available as a GitHub gist.
Accumulative Interface
I propose to add the following interface dubbed Accumulative (name to be discussed) to the JDK:
public interface Accumulative<T, A extends Accumulative<T, A, R>, R> {
void accumulate(T t); // target for Collector.accumulator()
A combine(A other); // target for Collector.combiner()
R finish(); // target for Collector.finisher()
}
This interface, as opposed to Collector, is object-oriented in nature, and classes implementing it must represent some mutable state.
Collector.of Overload
Having Accumulative, we can add the following Collector.of overload:
public static <T, A extends Accumulative<T, A, R>, R> Collector<T, ?, R> of(
Supplier<A> supplier, Collector.Characteristics... characteristics) {
return Collector.of(supplier, A::accumulate, A::combine, A::finish, characteristics);
}
Average-Developer Story
In this section, I show how the proposal may impact an average developer, who knows only the basics of the Collector API. If you know this API well, please do your best to imagine you don’t before reading on…
Example
Let’s reuse the example from my latest post (simplified even further). Assume that we have a Stream of:
interface IssueWiseText {
int issueLength();
int textLength();
}
and that we need to calculate issue coverage:
total issue length
─────────────
total text length
This requirement translates to the following signature:
Collector<IssueWiseText, ?, Double> toIssueCoverage();
Solution
An average developer may decide to use a custom accumulation type A to solve this (other solutions are possible, though). Let’s say the developer names it CoverageContainer so that:
T:IssueWiseTextA:CoverageContainerR:Double
Below, I’ll show how such a developer may arrive at the structure of CoverageContainer.
Structure Without Accumulative
Note: This section is long to illustrate how complex the procedure may be for a developer inexperienced with Collectors. You may skip it if you realize this already 🙂
Without Accumulative, the developer will look at Collector.of, and see four main parameters:
Supplier<A> supplierBiConsumer<A, T> accumulatorBinaryOperator<A> combinerFunction<A, R> finisher
To handle Supplier<A> supplier, the developer should:
- mentally substitute
AinSupplier<A>to getSupplier<CoverageContainer> - mentally resolve the signature to
CoverageContainer get() - recall the JavaDoc for
Collector.supplier() - recall method reference of the 4th kind (reference to a constructor)
- realize that
supplier = CoverageContainer::new
To handle BiConsumer<A, T> accumulator, the developer should:
BiConsumer<CoverageContainer, IssueWiseText>void accept(CoverageContainer a, IssueWiseText t)- mentally transform the signature to an instance-method one
void accumulate(IssueWiseText t) - recall method reference of the 3rd kind (reference to an instance method of an arbitrary object of a particular type)
- realize that
accumulator = CoverageContainer::accumulate
To handle BinaryOperator<A> combiner:
BinaryOperator<CoverageContainer>CoverageContainer apply(CoverageContainer a, CoverageContainer b)CoverageContainer combine(CoverageContainer other)combiner = CoverageContainer::combine
To handle Function<A, R> finisher:
Function<CoverageContainer, Double>Double apply(CoverageContainer a)double issueCoverage()finisher = CoverageContainer::issueCoverage
This long procedure results in:
class CoverageContainer {
void accumulate(IssueWiseText t) { }
CoverageContainer combine(CoverageContainer other) { }
double issueCoverage() { }
}
And the developer can define toIssueCoverage() (having to provide the arguments in proper order):
Collector<IssueWiseText, ?, Double> toIssueCoverage() {
return Collector.of(
CoverageContainer::new, CoverageContainer::accumulate,
CoverageContainer::combine, CoverageContainer::finish
);
}
Structure With Accumulative
Now, with Accumulative, the developer will look at the new Collector.of overload and will see only one main parameter:
-
Supplier<A> supplier
and one bounded type parameter:
A extends Accumulative<T, A, R>
So the developer will start with the natural thing — implementing Accumulative<T, A, R> and resolving T, A, R for the first and last time:
class CoverageContainer implements Accumulative<IssueWiseText, CoverageContainer, Double> {
}
At this point, a decent IDE will complain that the class must implement all abstract methods. What’s more — and that’s the most beautiful part — it will offer a quick fix. In IntelliJ, you hit “Alt+Enter” → “Implement methods”, and… you’re done! 😁
class CoverageContainer implements Accumulative<IssueWiseText, CoverageContainer, Double> {
@Override
public void accumulate(IssueWiseText issueWiseText) {
}
@Override
public CoverageContainer combine(CoverageContainer other) {
return null;
}
@Override
public Double finish() {
return null;
}
}
So… you don’t have to juggle the types, write anything manually, nor name anything!
Oh, yes — you still need to define toIssueCoverage(), but it’s simple now:
Collector<IssueWiseText, ?, Double> toIssueCoverage() {
return Collector.of(CoverageContainer::new);
}
Isn’t that nice? 😃
Implementation
The implementation isn’t relevant here, as it’s nearly the same for both cases (diff).
Rationale
Too Complex Procedure
I hope I’ve demonstrated how defining a custom Collector can be a challenge. I must say that even I always feel reluctant about defining one. However, I also feel that — with Accumulative — this reluctance would go away, because the procedure would shrink to two steps:
- Implement
Accumulative<T, A, R> - Call
Collector.of(YourContainer::new)
Drive to Implement
JetBrains coined “the drive to develop“, and I’d like to twist it to “the drive to implement”.
Since a Collector is simply a box of functions, there’s usually no point (as far as I can tell) to implement it (there are exceptions). However, a Google search for “implements Collector” shows (~5000 results) that people do it.
And it’s natural, because to create a “custom” TYPE in Java, one usually extends/implements TYPE. In fact, it’s so natural that even experienced developers (like Tomasz Nurkiewicz, a Java Champion) may do it.
To sum up, people feel the drive to implement, but — in this case — JDK provides them with nothing to implement. And Accumulative could fill this gap…
Relevant Examples
Finally, I searched for examples where it’d be straightforward to implement Accumulative.
In OpenJDK (which is not the target place, though), I found two:
On Stack Overflow, though, I found plenty: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53.
I also found a few array-based examples that could be refactored to Accumulative for better readability: a, b, c.
Naming
Accumulative is not the best name, mainly because it’s an adjective. However, I chose it because:
- I wanted the name to start with
A(as in<T, A, R>), - my best candidate (
Accumulator) was already taken byBiConsumer<A, T> accumulator(), AccumulativeContainerseemed too long.
In OpenJDK, A is called:
which prompts the following alternatives:
AccumulatingBoxAccumulationStateCollector.ContainerMutableResultContainer
Of course, if the idea were accepted, the name would go through the “traditional” name bikeshedding 😉
Summary
In this post, I proposed to add Accumulative interface and a new Collector.of overload to the JDK. With them, creating a custom Collector would no longer be associated by developers with a lot of effort. Instead, it’d simply become “implement the contract” & “reference the constructor”.
In other words, this proposal aims at lowering the bar of entering the custom-Collector world!
Appendix
Optional reading below.
Example Solution: JDK 12+
In JDK 12+, we’ll be able to define toIssueCoverage() as a composed Collector, thanks to Collectors.teeing (JDK-8209685):
static Collector<IssueWiseText, ?, Double> toIssueCoverage() {
return Collectors.teeing(
Collectors.summingInt(IssueWiseText::issueLength),
Collectors.summingInt(IssueWiseText::textLength),
(totalIssueLength, totalTextLength) -> (double) totalIssueLength / totalTextLength
);
}
The above is concise, but it may be somewhat hard to follow for a Collector API newbie.
Example Solution: the JDK Way
Alternatively, toIssueCoverage() could be defined as:
static Collector<IssueWiseText, ?, Double> toIssueCoverage() {
return Collector.of(
() -> new int[2],
(a, t) -> { a[0] += t.issueLength(); a[1] += t.textLength(); },
(a, b) -> { a[0] += b[0]; a[1] += b[1]; return a; },
a -> (double) a[0] / a[1]
);
}
I dubbed this the “JDK way”, because some Collectors are implemented like that in OpenJDK (e.g. Collector.averagingInt).
Yet, while such terse code may be suitable for OpenJDK, it’s certainly not suitable for business logic because of the level of readability (which is low to the point that I call cryptic).
2 thoughts on “Accumulative: Custom Java Collectors Made Easy”
Following the Twitter discussion, I assembled the summary below.
1-arg
Collector.of:Collector.ofcalls (but not all)Accumulative)AccumulativeimplementationAccumulativeinterface:Collector(not a replacement for it)AinCollector<T, A, R>Collector.of)Collectors.filtering/mapping/teeing)Key concerns:
Collectors popular enough to justify this addition? (by Richard Warburton)Collectorcomposition) to a low-level one (OOP-likeAccumulativeimplementation)?In other words, 5 key takeaways (InfoQ-like):
Collector(Java Stream API) backed by a private dedicated accumulation type is fairly complex now.Collector.ofoverload is syntax sugar for the four-argumentCollector.ofcalls that reference a private dedicated accumulation type. The overload takes three parameters less at the cost of introducing an extra interface (Accumulative) to be implemented by this dedicated type.Accumulativewe “get” the implementation structure “for free” (in other words, the IDE generates all the method stubs for us).Accumulative<T, A, R>is not an alternative or replacement forCollector<T, A, R>.Accumulativesimply serves as a bound forAinCollector<T, A, R>.