In Hadoop cluster, namenode communicate with all the other nodes. Apache Hadoop on Windows Azure have the following XML file which includes all the primary settings for Hadoop:
C:\Apps\Dist\conf\HDFS-SITE.XML
<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- Put site-specific property overrides in this file. -->
<configuration>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
<name>dfs.replication</name>
<value>3</value>
<name>dfs.datanode.max.xcievers</name>
<value>4096</value>
<name>dfs.name.dir</name> <======This is the NAME node data directory
<value>c:\hdfs\nn</value>
<name>dfs.data.dir</name> <========= This is the DATA node data directory
<value>c:\hdfs\dn</value>
</configuration>
C:\Apps\Dist\conf\Core-site.xml
<name>hadoop.tmp.dir</name>
<value>/hdfs/tmp</value>
<description>A base for other temporary directories.</description>
<name>fs.default.name</name>
<value>hdfs://10.26.104.45:9000</value> <== After the role started the VM gets IP Address and then included here
<name>io.file.buffer.size</name>
<value>131072</value>
C:\Apps\Dist\conf\Mapred-site.xml:
<name>mapred.job.tracker</name>
<value>10.26.104.45:9010</value>
<name>mapred.local.dir</name>
<value>/hdfs/mapred/local</value>
<name>mapred.tasktracker.map.tasks.maximum</name>
<value>2</value>
<name>mapred.tasktracker.reduce.tasks.maximum</name>
<value>1</value>
<name>mapred.child.java.opts</name>
<value>-Xmx1024m</value>
<name>mapreduce.client.tasklog.timeout</name>
<value>6000000</value>
<name>mapred.task.timeout</name>
<name>mapreduce.reduce.shuffle.connect.timeout</name>
<value>600000</value>
<name>mapreduce.reduce.shuffle.read.timeout</name>
You sure can make necessary changes to above setting however after that you would need to restart namenode as below:
For more command you can see check the Hadoop command line shortcut:
c:\apps\dist>hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
namenode -format format the DFS filesystem
secondarynamenode run the DFS secondary namenode
namenode run the DFS namenode
datanode run a DFS datanode
dfsadmin run a DFS admin client
mradmin run a Map-Reduce admin client
fsck run a DFS filesystem checking utility
fs run a generic filesystem user client
balancer run a cluster balancing utility
jobtracker run the MapReduce job Tracker node
pipes run a Pipes job
tasktracker run a MapReduce task Tracker node
job manipulate MapReduce jobs
queue get information regarding JobQueues
version print the version
jar <jar> run a jar file
distcp <srcurl> <desturl> copy file or directories recursively
archive -archiveName NAME <src>* <dest> create a hadoop archive
daemonlog get/set the log level for each daemon
or
CLASSNAME run the class named CLASSNAME
Most commands print help when invoked w/o parameters.
You can also use the following configuration related with Java logging which can be modified, however you would need to re-launch Java process again:
C:\Apps\Dist\conf\Log4j.properties:
hadoop.log.file=hadoop.log
log4j.rootLogger=${hadoop.root.logger}, EventCounter
log4j.threshhold=ALL
#
# TaskLog Appender
#Default values
hadoop.tasklog.taskid=null
hadoop.tasklog.noKeepSplits=4
hadoop.tasklog.totalLogFileSize=100
hadoop.tasklog.purgeLogSplits=true
hadoop.tasklog.logsRetainHours=12
log4j.appender.TLA=org.apache.hadoop.mapred.TaskLogAppender
log4j.appender.TLA.taskId=${hadoop.tasklog.taskid}
log4j.appender.TLA.totalLogFileSize=${hadoop.tasklog.totalLogFileSize}
log4j.appender.TLA.layout=org.apache.log4j.PatternLayout
log4j.appender.TLA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
# FSNamesystem Audit logging
log4j.logger.org.apache.hadoop.fs.FSNamesystem.audit=WARN
# Custom Logging levels
#log4j.logger.org.apache.hadoop.mapred.JobTracker=DEBUG
#log4j.logger.org.apache.hadoop.mapred.TaskTracker=DEBUG
#log4j.logger.org.apache.hadoop.fs.FSNamesystem=DEBUG
# Jets3t library
log4j.logger.org.jets3t.service.impl.rest.httpclient.RestS3Service=ERROR
# Event Counter Appender
# Sends counts of logging messages at different severity levels to Hadoop Metrics.
log4j.appender.EventCounter=org.apache.hadoop.log.metrics.EventCounter
Resources:
http://hadoop.apache.org/common/docs/current/cluster_setup.html
http://allthingshadoop.com/2010/04/28/map-reduce-tips-tricks-your-first-real-cluster/