Click on the virtual machine so we can get it's url, which will be needed to connect to it. Then select DASHBOARD. You can see, that in my case, the DNS for my server is centos-hadoop.cloudapp.net. You can remote in from Putty.
The NameNode is the centerpiece of an HDFS file system. It keeps the directory tree of all files in the file system, and tracks where across the cluster the file data is kept. It does not store the data of these files itself.
Client applications talk to the NameNode whenever they wish to locate a file, or when they want to add/copy/move/delete a file. The NameNode responds the successful requests by returning a list of relevant DataNode servers where the data lives.
A DataNode stores data in the [HadoopFileSystem]. A functional filesystem has more than one DataNode, with data replicated across them. On startup, a DataNode connects to the NameNode; spinning until that service comes up. It then responds to requests from the NameNode for filesystem operations.
You can now start and shutdown your Azure Virtual Machine as needed. There is no need to accrue expensive charges for keeping your single-node hadoop cluster up and running.
The portal offers a shutdown and start command so you only pay for what you use. I’ve even figured out how to install HIVE and a few other Hadoop tools. Happy big data coding!
Good article and good explanation of the commands required. For those who got scared of those commands or their syntax, other simple way of getting into hadoop single node cluster (without having to install manually) is using pre-prepared VMs such as Cloudera QuickVM or Horton SandBox vms.
GK (Gopalakrishna Palem)