We have touched on Hadoop before in our blog, first announcing our partnership with Hortonworks, back in February of 2014, and more recently on our Big Data Solutions Page. Hadoop is a powerful open source framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models, making it an excellent Big Data tool. Hadoop can benefit from the operational efficiencies of Cloud-A’s elastic, API driven infrastructure-as-a-service – and it is a growing use case among our users.
There are many aspects of our compute that lend themselves to successful Hadoop deployments on Cloud-A, like our Big Data High Memory® and Big Data High Compute® instance flavours that provide the scale and power for the biggest jobs, but there are other elements of Cloud-A infrastructure-a-service that make it an ideal host for Hadoop Big Data solutions.
Let’s take a look at how you can leverage Cloud-A Bulk Storage, our object storage product, as a reliable, scalable and cost-effective Big Data repository for your Hadoop platform.
Bulk Storage Compatibility
We have outlined the many benefits of object storage in our whitepaper, Object Storage 101, including its Big Data use case. Cloud-A Bulk Storage is powered by OpenStack Swift, an open source object storage technology with Hadoop compatibility.
Hadoop has created the Swift filesystem, which allows Hadoop applications like MapReduce and Pig to read and write data directly into OpenStack Swift containers, or in the case of Cloud-A, Bulk Storage containers.
The cool thing about the Swift filesystem is that it separates the compute and storage resources of the Hadoop cluster, allowing each to have its own lifespan. This makes Bulk Storage an ideal, long term repository for data that only needs to be processed by compute periodically. Users can keep their data in Bulk Storage, spin up Hadoop workers to crunch the data, and spin them back down after the job is complete to reduce the cost of the project. Additionally, data that already exists in Bulk Storage containers can be processed without moving the data over to the Hadoop’s file system.
How do I use Swift Filesystem with Bulk Storage?
Hadoop filesystem URLs for Swift take the following form in the core-site.xml:
swift://acontainer.aservice/path/to/files
so in terms of Cloud-A it would look like this:
swift://{container name}.clouda/
Hadoop configurations
Example: Cloud-A, in-cluster access using API key
This service definition is for use in a Hadoop cluster deployed within Cloud-A’s infrastructure.
<?xml version="1.0" encoding="UTF-8"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.swift.service.clouda.auth.url</name> <value>https://ca-ns-1.bulkstorage.ca:8444/keys_auth/{container_name}/v2.0/tokens</value> <description>Cloud-A Bulk Storage</description> </property> <property> <name>fs.swift.service.clouda.username</name> <value>Full-Key</value> </property> <property> <name>fs.swift.service.clouda.tenant</name> <value>{tenant id}</value> </property> <property> <name>fs.swift.service.clouda.public</name> <value>true</value> </property> <property> <name>fs.swift.service.clouda.password</name> <value>{container key}</value> </property> </configuration>
What is next?
Stay tuned for some more Big Data content as it continues to grow as one of the leading use cases on Cloud-A. Have a request for a technical blog post on another Big Data tool, or would like to appear as a guest blogger? Let us know.