Friday, September 9, 2011

Re: [VOTE] Accumulo to join the Incubator

Qualifying: +1 (non-binding).

I would also like to repeat what Marvin Humphrey said:

"I've been impressed by how the Accumulo representatives have conducted
themselves during this week of discussion, and I believe that they will
become
valuable and productive participants within Apache."

- milind

---
Milind Bhandarkar
Greenplum Labs, EMC
(Disclaimer: Opinions expressed in this email are those of the author, and
do not necessarily represent the views of any organization, past or
present, the author might be affiliated with.)



On 9/9/11 9:33 AM, "Bhandarkar, Milind" <Milind.Bhandarkar@emc.com> wrote:

>+1 !
>
>- milind
>
>On 9/9/11 9:22 AM, "Doug Cutting" <cutting@apache.org> wrote:
>
>>It's been a week since the Accumulo proposal was submitted for
>>discussion. A few questions were asked, and the proposal was clarified
>>in response. Sufficient mentors have volunteered. I thus feel we are
>>now ready for a vote.
>>
>>The latest proposal can be found at the end of this email and at:
>>
>> http://wiki.apache.org/incubator/AccumuloProposal
>>
>>The discussion regarding the proposal can be found at:
>>
>> http://s.apache.org/oi
>>
>>Please cast your votes:
>>
>>[ ] +1 Accept Accumulo for incubation
>>[ ] +0 Indifferent to Accumulo incubation
>>[ ] -1 Reject Accumulo for incubation
>>
>>This vote will close 72 hours from now.
>>
>>Thanks,
>>
>>Doug
>>
>>-----------------------
>>
>>= Accumulo Proposal =
>>
>>== Abstract ==
>>Accumulo is a distributed key/value store that provides expressive,
>>cell-level access labels.
>>
>>== Proposal ==
>>Accumulo is a sorted, distributed key/value store based on Google's
>>BigTable design. It is built on top of Apache Hadoop, Zookeeper, and
>>Thrift. It features a few novel improvements on the BigTable design in
>>the form of cell-level access labels and a server-side programming
>>mechanism that can modify key/value pairs at various points in the data
>>management process.
>>
>>== Background ==
>>Google published the design of BigTable in 2006. Several other open
>>source projects have implemented aspects of this design including HBase,
>>CloudStore, and Cassandra. Accumulo began its development in 2008.
>>
>>== Rationale ==
>>There is a need for a flexible, high performance distributed key/value
>>store that provides expressive, fine-grained access labels. The
>>communities we expect to be most interested in such a project are
>>government, health care, and other industries where privacy is a
>>concern. We have made much progress in developing this project over the
>>past 3 years and believe both the project and the interested communities
>>would benefit from this work being openly available and having open
>>development.
>>
>>== Current Status ==
>>
>>=== Meritocracy ===
>>We intend to strongly encourage the community to help with and
>>contribute to the code. We will actively seek potential committers and
>>help them become familiar with the codebase.
>>
>>=== Community ===
>>A strong government community has developed around Accumulo and training
>>classes have been ongoing for about a year. Hundreds of developers use
>>Accumulo.
>>
>>=== Core Developers ===
>>The developers are mainly employed by the National Security Agency, but
>>we anticipate interest developing among other companies.
>>
>>=== Alignment ===
>>Accumulo is built on top of Hadoop, Zookeeper, and Thrift. It builds
>>with Maven. Due to the strong relationship with these Apache projects,
>>the incubator is a good match for Accumulo.
>>
>>== Known Risks ==
>>=== Orphaned Products ===
>>There is only a small risk of being orphaned. The community is
>>committed to improving the codebase of the project due to its fulfilling
>>needs not addressed by any other software.
>>
>>=== Inexperience with Open Source ===
>>The codebase has been treated internally as an open source project since
>>its beginning, and the initial Apache committers have been involved with
>>the code for multiple years. While our experience with public open
>>source is limited, we do not anticipate difficulty in operating under
>>Apache's development process.
>>
>>=== Homogeneous Developers ===
>>The committers have multiple employers and it is expected that
>>committers from different companies will be recruited.
>>
>>=== Reliance on Salaried Developers ===
>>The initial committers are all paid by their employers to work on
>>Accumulo and we expect such employment to continue. Some of the initial
>>committers would continue as volunteers even if no longer employed to do
>>so.
>>
>>=== Relationships with Other Apache Products ===
>>Accumulo uses Hadoop, Zookeeper, Thrift, Maven, log4j, commons-lang,
>>-net, -io, -jci, -collections, -configuration, -logging, and -codec.
>>
>>=== Relationship to HBase ===
>>Accumulo and HBase are both based on the design of Google's BigTable, so
>>there is a danger that potential users will have difficulty
>>distinguishing the two. Some of the key areas in which Accumulo differs
>>from HBase are discussed below. It may be possible to incorporate the
>>desired features of Accumulo into HBase. However, the amount of work
>>required would slow development of HBase and Accumulo considerably. We
>>believe this warrants a podling for Accumulo at the current time. We
>>expect active cross-pollination will occur between HBase and podling
>>Accumulo and it is possible that the codebases and projects will
>>ultimately converge.
>>
>>==== Access Labels ====
>>Accumulo has an additional portion of its key that sorts after the
>>column qualifier and before the timestamp. It is called column
>>visibility and enables expressive cell-level access control.
>>Authorizations are passed with each query to control what data is
>>returned to the user. The column visibilities are boolean AND and OR
>>combinations of arbitrary strings (such as "(A&B)|C") and authorizations
>>are sets of strings (such as {C,D}).
>>
>>==== Iterators ====
>>Accumulo has a novel server-side programming mechanism that can modify
>>the data written to disk or returned to the user. This mechanism can be
>>configured for any of the scopes where data is read from or written to
>>disk. It can be used to perform joins on data within a single tablet.
>>
>>==== Flexibility ====
>>HBase requires the user to specify the set of column families to be used
>>up front. Accumulo places no restrictions on the column families.
>>Also, each column family in HBase is stored separately on disk.
>>Accumulo allows column families to be grouped together on disk, as does
>>BigTable. This enables users to configure how their data is stored,
>>potentially providing improvements in compression and lookup speeds. It
>>gives Accumulo a row/column hybrid nature, while HBase is currently
>>column-oriented.
>>
>>==== Testing ====
>>Accumulo has testing frameworks that have resulted in its achieving a
>>high level of correctness and performance. We have observed that under
>>some configurations and conditions Accumulo will outperform HBase and
>>provide greater data integrity.
>>
>>==== Logging ====
>>HBase uses a write-ahead log on the Hadoop Distributed File System.
>>Accumulo has its own logging service that does not depend on
>>communication with the HDFS NameNode.
>>
>>==== Storage ====
>>Accumulo has a relative key file format that improves compression.
>>
>>==== Areas in which HBase features improvements over Accumulo ====
>>in memory tables, upserts, coprocessors, connections to other projects
>>such as Cascading and Pig
>>
>>=== Expectations ===
>>There is a risk that Accumulo will be criticized for not providing
>>adequate security. The access labels in Accumulo do not in themselves
>>provide a complete security solution, but are a mechanism for labeling
>>each piece of data with the authorizations that are necessary to see it.
>>
>>=== Apache Brand ===
>>Our interest in releasing this code as an Apache incubator project is
>>due to its strong relationship with other Apache projects, i.e. Accumulo
>>has dependencies on Hadoop, Zookeeper, and Thrift and has complementary
>>goals to HBase.
>>
>>== Documentation ==
>>There is not currently documentation about Accumulo on the web, but a
>>fair amount of documentation and training materials exists and will be
>>provided on the Accumulo wiki at apache.org. Also, a paper discussing
>>YCSB results for Accumulo will be presented at the 2011 Symposium on
>>Cloud Computing.
>>
>>== Initial Source ==
>>Accumulo has been in development since spring 2008. There are hundreds
>>of developers using it and tens of developers have contributed to it.
>>The core codebase consists of 200,000 lines of code (mainly Java) and
>>100s of pages of documentation. There are also a few projects built on
>>top of Accumulo that may be added to its contrib in the future. These
>>include support for Hive, Matlab, YCSB, and graph processing.
>>
>>== Source and Intellectual Property Submission Plan ==
>>Accumulo core code, examples, documention, and training materials will
>>be submitted by the National Security Agency.
>>
>>We will also be soliciting contributions of further plugins from MIT
>>Lincoln Labs, Carnegie Mellon University, and others.
>>
>>Accumulo has been developed by a mix of government employees and private
>>companies under government contract. Material developed by government
>>employees is in the public domain and no U.S. copyright exists in works
>>of the federal government. For the contractor developed material in the
>>initial submission, the U.S. Government has sufficient authority per the
>>ICLA from the copyright owner to contribute the Accumulo code to the
>>incubator.
>>
>>There has been some discussion regarding accepting contributions from US
>>Government sources on https://issues.apache.org/jira/browse/LEGAL-93. We
>>propose that the NSA will sign an ICLA/CCLA if that document could be
>>slightly modified to explicitly address copyright in works of government
>>employees. Specifically, we propose that the definition of ³You² be
>>modified to include ³the copyright owner, the owner of a Contribution
>>not subject to copyright, or legal entity authorized by the copyright
>>owner that is making this Agreement.² In addition, section 2, the
>>copyright license grant be modified after ³You hereby grant² that either
>>states ³to the extent authorized by law² or ³to the extent copyright
>>exists in the Contribution.² These changes will permit US Government
>>employee developed work to be included.
>>
>>One proposed solution is to form a Collaborative Research and
>>Development Agreement (CRADA) between the Apache Software Foundation and
>>the US Government, but this will not solve the underlying problem that
>>U.S. law does not grant copyright to works of government employees. At
>>this time a CRADA is not necessary but should it be determined that a
>>CRADA is necessary, we would like to work through that process during
>>the incubation phase of Accumulo rather than before acceptance as this
>>may take time to enter into an agreement.
>>
>>== External Dependencies ==
>>jetty (Apache and EPL), jline (BSD), jfreechart (LGPL), jcommon (LGPL),
>>slf4j (MIT), junit (CPL)
>>
>>== Cryptography ==
>>none
>>
>>== Required Resources ==
>> * Mailing Lists
>> * accumulo-private
>> * accumulo-dev
>> * accumulo-commits
>> * accumulo-user
>>
>> * Subversion Directory
>> * https://svn.apache.org/repos/asf/incubator/accumulo
>>
>> * Issue Tracking
>> * JIRA Accumulo (ACCUMULO)
>>
>> * Continuous Integration
>> * Jenkins builds on https://builds.apache.org/
>>
>> * Web
>> * http://incubator.apache.org/accumulo/
>> * wiki at http://wiki.apache.org or http://cwiki.apache.org
>>
>>== Initial Committers ==
>> * Aaron Cordova (aaron at cordovas dot org)
>> * Adam Fuchs (adam.p.fuchs at ugov dot gov)
>> * Eric Newton (ecn at swcomplete dot com)
>> * Billie Rinaldi (billie.j.rinaldi at ugov dot gov)
>> * Keith Turner (keith.turner at ptech-llc dot com)
>> * John Vines (john.w.vines at ugov dot gov)
>> * Chris Waring (christopher.a.waring at ugov dot gov)
>>
>>== Affiliations ==
>> * Aaron Cordova, The Interllective
>> * Adam Fuchs, National Security Agency
>> * Eric Newton, SW Complete Incorporated
>> * Billie Rinaldi, National Security Agency
>> * Keith Turner, Peterson Technology LLC
>> * John Vines, National Security Agency
>> * Chris Waring, National Security Agency
>>
>>== Sponsors ==
>> * Champion: Doug Cutting
>>
>>== Nominated Mentors ==
>> * Benson Margulies
>> * Alan Cabrera
>> * Bernd Fondermann
>> * Owen O'Malley
>>
>>== Sponsoring Entity ==
>> * Apache Incubator
>>
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>>For additional commands, e-mail: general-help@incubator.apache.org
>>
>>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: general-unsubscribe@incubator.apache.org
>For additional commands, e-mail: general-help@incubator.apache.org
>
>

No comments:

Post a Comment