Tuesday, September 6, 2011

Accumulo incubator proposal: Statement of Concern

I am writing to state my concerns with the Accumulo Incubator proposal, the HBase copy/clone.

1) "Accumulo has been in development since spring 2008."

I don't fault anybody for being scared of HBase in 2008 – you'd have to be pretty brave to use it then. HBase 0.20 was the first release to get wide adoption and that came out in the fall of 2009. That said, the fall of 2009 was two years ago.

2) "Some of the desired features of Accumulo could be incorporated into HBase, however the most important of these may be unlikely to be adopted (see cell-level access labels and iterators below)"

The proposal claims that the most important features "may be unlikely to be adopted" by HBase. Really?? How do the Accumulo developers know this?

Not a single request was made either in dist-list or Jira form to the HBase community regarding these requested features. Why is open communication such a problem? Remember that Accumulo had 2 years to put together such a request. For a project trying to achieve the exact same goals as HBase, this is not a minor issue.

The past is unfortunately the best predictor of futureperformance, and while excuses have been made about sharing code and communication being "hard" for the employer of the majority of the Accumulo developers, the lack of open-ness for an ASF project is a non-starter. For example, the HBase team received a recent finger-wag when the project committers voted on a new logo (i.e., instead of letting the entire community vote). This somewhat humorous infraction does prove the point: open-ness is required at all levels.

HBase has, for years, demonstrated this principle through public feature requests, public bug reports, public code reviews, and public dist-list conversations on every conceivable issue. Based on past performance, I don't see Accumulo, or the developers behind Accumulo, being able to make the cut in this respect.

3) " === Apache Brand === > Our interest in releasing this code as an Apache incubator project is due to its strong relationship with other Apache projects, i.e. Hadoop, Zookeeper, and HBase"

Regarding "strong relationship" see point #2 about on non-communication over the last few years.

Side-bar conversations with a Hadoop developer or two do not count as "community communication."

4) In Summary

If Accumulo wishes to be an open-source project, so be it - but put it on Google Code, SourceForge, or Github. There are plenty of places. But I don't think it belongs in ASF.

I'm sure that other developers may have some comments about copied HBase and Hadoop code, but I'll leave that to them.


Doug Meil

No comments:

Post a Comment