Disclaimer
Any of this information may or may not be correct, if it's incorrect I am sorry but if it is correct then woohoo this stuff is really very interesting, so I hope it's the latter!This world of stuff
The term BigData gets banded about so much that it's almost as irritating as when people refer to the cloud without a tiny iota of knowledge of what that actually means. Anyway for the purpose of this blog lets just assume that data is stuff and stuff is data and when there's lots of it we're going to call it BigData. This definition makes no claims of whether the data is useful, required, requested or simply just noise as I suspect about 99% of data online is, I mean really does anyone care that you're eating a cinnamon bun at 12pm why tweet that?Why am I starting a blog about Streams by talking about BigData, it should be fairly obvious that when the amount of data you're looking at is large but the amount of data you're actually interested in is a small subset of this that the gains via parallelism could be huge, and it's this parallelism that the Streams make so simple.
Now it's not that Streams and even map-reduce are Earth shatteringly new, because they're not, we could always have achieved the same goal before java 8, however the streams API forces you, the developer, to consider writing your algorithm in a not immediately obvious way. It's this no immediately obvious way that means most of the algorithms wouldn't have been made so easily parrallelisable (is that a word?)
That's not right, is it?
I like Martin Fowler, I consider him one of the good guys, which is why it pains me a little to have issues regarding one of his blog posts http://martinfowler.com/articles/replaceThrowWithNotification.html. The article seems to imply that using the Notification class/object to report back a collection of errors is the preferred way of working always. It's that lack of context that I hate with many new ideas around programming, most of the time the new ideas are in fact old ideas in a new language and they didn't take off in the past because they can't and shouldn't be applied to every situation.My main issue is that I rather dislike mutating methods, those that change the underlying state of the system, why I hear you ask?
code like this
private void validateDate(Notification note) {
to me doesn't do what the name suggests as it's validating and storing the result, the fact that the word and is required to sum up what the method does indicates that it should be at least two methods anyway but that's by-the-by. Imagine there are lots of requests coming in that all require validation in a number of different ways for example you have tens of thousands of requests each needing to do this:
private Notification note ... validateDate(Notification note) validateAge(Notification note) validateName(Notification note) validateComment(Notification note) validateForm(Notification note)also note that these validation methods may themselves use other validation methods for example
private void validateForm(Notification note) {
validateName(Notification note)
validateAge(Notification note)
}
As I was writing this example I also realised something else, the validators themselves aren't easy to re-use, for example validateAge must be tied to the same field to verify, I'd write it more like this:private FormToValidate form;
private List<Function<FieldToValidate, ValidationResult> validators;
...
validators = new ArrayList<>();
validators.add( (
validateDate(Notification note) validateAge(Notification note) validateName(Notification note) validateComment(Notification note) validateForm(Notification note)
Dam I've run out of time leaving a partial brain-dump, OK for anyone reading this the point I am driving at is the mutable method prevents it from being parallelised. Switching to using methods that are supplied with everything they need and return the single result means they can form part of the map-reduce style of operating, in other words validation methods now occur on any available threads at the same time rather than sequentially validating one field after another. Something that wouldn't be noticed on a single form submission but over millions it would be, I think there may be some memory saving to be had in there too since there'd be a single validateXXX method regardless of the number of submissions had been made because the now immutable method is reused and won't need it's own stackframe, alright so that's the wrong word but I can't remember the term and my lunchtime has ended already I'll credit the reader with enough intelligence to work that lot out :)
No comments:
Post a Comment