Hadoop for Windows Succinctly

Categories:

Recommended

Hadoop is a collection of utilities that work together to enable distributed storage and processing of very large datasets. Since its inception, it has almost exclusively been associated with Linux operating systems. An example of this is the number of textbooks and publications focusing on Hadoop for Linux. Conversely, the number of textbooks focusing on Hadoop for Windows is almost non-existent.

It’s important at this stage to be clear about what I mean when I say “Hadoop for Windows.” Hadoop for Windows refers to Hadoop running directly on the Microsoft Windows operating system, and native support for Windows must be provided.

Hadoop for Windows does not include:

• Hadoop running on Windows via an emulator or virtual machine

• Hadoop accessed via the cloud from a Windows machine

• Hadoop running on Windows via any kind of third-party container system

There are many online examples of people recommending the preceding options to run Hadoop for Windows. They seem unaware that Hadoop can be installed directly onto Windows, so it’s not something they consider. The aim of this book is firstly to make people aware that Hadoop runs perfectly on Windows. The subsequent aim is to guide the reader in the installation and usage of Hadoop on the Windows platform.

Although this book is about Hadoop for Windows, it would not be credible to omit referring to Linux where it’s pertinent to do so. By comparing the two environments as operating systems for Hadoop, we may discover the reasons behind the popularity of Linux. That said, there are some fairly obvious reasons why Hadoop has been so heavily associated with Linux. Hadoop is open source, so an open source operating system such as Linux was always a natural pairing; both are also available free of charge. As important as the role of Linux has been, the role of Microsoft is equally important. Microsoft deployed Hadoop in the cloud via HDInsight on Microsoft Azure.

HDInsight was the result of Hortonworks collaborating with Microsoft, and was based upon the Hortonworks Hadoop distribution. A desktop emulator version was available, and Hortonworks released its own Hadoop distribution for Windows, which is now archived. Since then, Hortonworks has promoted its Hadoop Sandbox for Windows that only runs on a virtual machine. The end of HDInsight for Windows finally came in July 2018, leaving only HDInsight for Linux. Naturally, it raised eyebrows that Microsoft ended HDInsight for Windows, and various questions were raised, including: Why was Hadoop for Windows named HDInsight? If you asked IT professionals if they knew Microsoft had released Hadoop for Windows, how many would know? The reality is that Microsoft has never released a multi-node version of Hadoop for on-premises usage.

The optimum solution may have been to offer the same HDInsight solution on premises that was offered in the cloud. There have been numerous products that haven’t done as well as they could have, as they were primarily cloud-based. IBM Watson Analytics springs to mind—it’s an intelligent piece of software, but unavailable on premises, so it lost on-premises sales.

An on-premises setup puts you in control, but in Azure cloud you can only use Ranger, Kafka, Interactive Query, and Spark on HDInsight for Linux. You can’t use them in the retired HDInsight for Windows, nor can you create or resize Windows clusters. Despite this, Microsoft feels that its Hadoop deployment has an edge over competitors by using Azure Storage to store data, instead of on-premises storage or nodes.

Category:

Attribution

Dave Vickers. Hadoop for Windows Succinctly. https://www.syncfusion.com/ebooks/hadoop-for-windows-succinctly

VP Flipbook Maker

Convert your work to digital flipbook with VP Online Flipbook Maker! You can also create a new one with the tool. Try it now!