By Mike McNamara, Sr. Manager, Product Marketing, NetApp

 

While Hadoop has been used mainly on incoming, external data, there’s also been a need to use it on existing, internal data, typically stored in network-attached storage (NAS). However, using Hadoop on internal data like this has a downside. Typically, it requires setting up another storage silo to host the ) and then running the Hadoop analytics on that storage. This results in additional data management, more inefficiencies, and additional costs of moving the data between NAS and HDFS.

 

with the NetApp NFS Connector for Hadoop, which allows analytics software to use NetApp clustered Data ONTAP®. The connector works with Apache Hadoop and Apache Spark by using a simple configuration file change that enables data on NFSv3 storage to be analyzed. By using clustered Data ONTAP, the connector decouples analytics from storage, leveraging the benefits of NAS. For even higher performance, the NetApp NFS Connector for Hadoop can be combined with Tachyon to build a scale-out caching tier that is backed by clustered Data ONTAP.

 

NetApp Solutions for Hadoop and NFS Connector for Hadoop.jpg

 

 

You can employ NetApp NFS Connector for Hadoop to run big data analytics on NFSv3 data-without moving the data, creating a separate analytics silo, or setting up a Hadoop cluster. You can start analyzing existing data with Hadoop right away. You can also leverage NFS Connector to run a proof-of-concept, then set up a Hadoop cluster using NetApp Solutions for Hadoop for data from external sources. 

 

NFS Connector lets you swap out of HDFS for NFS or run NFS alongside HDFS. NFS Connector works with MapReduce for compute or processing and supports other Apache projects, including HBase (columnar database) and Spark (processing engine compatible with Hadoop). These capabilities let NFS Connector support diverse workloads-including batch, in-memory, streaming, and more. 

mm

Mike McNamara

Mike McNamara is a senior manager of product and solution marketing at NetApp with over 25 years of storage and data management marketing experience. Before joining NetApp over 10 years ago, Mike worked at Adaptec, EMC and Digital Equipment Corporation. Mike was a key leader driving the launch of the industry’s first unified scale-out storage system (NetApp), iSCSI and SAS storage system (Adaptec), and Fibre Channel storage system (EMC CLARiiON ). In addition to his past role as marketing chairperson for the Fibre Channel Industry Association, he is a member of the Ethernet Technology Summit Conference Advisory Board, a member of the Ethernet Alliance, a regular contributor to industry journals, and a frequent speaker at events.