Avkash Chauhan's Blog

Windows Azure, Windows 8, Cloud Computing, Big Data and Hadoop: All together at one place.. One problem, One solution at One time...

Hadoop Configuration Files in HDInsight

Hadoop Configuration Files in HDInsight

Rate This
  • Comments 3

 I am keeping these settings for reference purposes and use these settings for quick check.  

Mapred-site.xml

name

value

description

mapred.system.dir

/mapred/system

No description

mapred.job.tracker

jobtrackerhost:9010

No description

mapred.jobtracker.taskScheduler

org.apache.hadoop.mapred.

CapacityTaskScheduler

 

mapred.job.tracker.http.address

jobtrackerhost:50030

No description

mapred.local.dir

c:\hdfs\mapred\local

No description

mapred.job.tracker.history.completed.location

 

No description

mapreduce.history.server.embedded

true

Should job history server be embedded within Job tracker process

mapreduce.history.server.http.address

0.0.0.0:51111

Http address of the history server

mapred.queue.names

default,joblauncher

Comma separated list of queues configured for this jobtracker.

mapred.map.child.java.opts

-Xms512m -Xmx1024m

 

mapred.reduce.child.java.opts

-Xms512m -Xmx2048m

 

hadoop.job.history.user.location

hdfs://namenodehost:9000/

mapred/userhistory

 

mapred.child.java.opts

-Xms512m -Xmx1024m

This is used by the TaskTracker when creating Mapper/Reducer child VM

mapreduce.reduce.shuffle.read.timeout

600000

 

mapreduce.reduce.shuffle.connect.timeout

600000

 

mapred.map.max.attempts

8

 

mapred.reduce.max.attempts

8

 

mapred.tasktracker.reduce.tasks.maximum

2

 

mapred.task.timeout

6000000

 

mapreduce.jobtracker.staging.root.dir

hdfs://namenodehost:9000/

mapred/staging

 

mapred.tasktracker.map.tasks.maximum

4

 

mapreduce.client.tasklog.timeout

6000000

 

mapred.max.tracker.failures

8

 

mapred.child.ulimit

 

Maximum Virtual Memory for each child task (The value for mapred.child.ulimit should be specified in kilobytes (KB) and must be greater than or equal to the -Xmx value passed to the Java VM.)

 

Core-site.xml

name

value

description

fs.default.name

asv://avkashhdinsight@storage_name.

blob.core.windows.net

The name of the default file system. Either the literal string "local" or a host:port for NDFS.

hadoop.tmp.dir

/hdfs/tmp

A base for other temporary directories.

fs.trash.interval

60

Number of minutes between trash checkpoints. If zero, the trash feature is disabled.

fs.checkpoint.dir

c:\hdfs\2nn

Determines where on the local filesystem the DFS secondary name node should store the temporary images to merge. If this is a comma-delimited list of directories then the image is replicated in all of the directories for redundancy.

fs.checkpoint.edits.dir

c:\hdfs\2nn

Determines where on the local filesystem the DFS secondary name node should store the temporary edits to merge. If this is a comma-delimited list of directoires then teh edits is replicated in all of the directoires for redundancy. Default value is same as fs.checkpoint.dir

fs.checkpoint.period

1800

The number of seconds between two periodic checkpoints.

fs.checkpoint.size

67108864

The size of the current edit log (in bytes) that triggers a periodic checkpoint even if the fs.checkpoint.period hasn't expired.

fs.azure.buffer.dir

/tmp

 

topology.script.file.name

E:\approot\bin\AzureTopology.exe

 

io.file.buffer.size

131072

 

fs.azure.account.key.storagename.

blob.core.windows.net

***********storagekey******

 

dfs.namenode.rpc-address

hdfs://namenodehost:9000

 

slave.host.name

   

hadoop.proxyuser.hdp.hosts

headnodehost

 

hadoop.proxyuser.hdp.groups

oozieusers

 

 

HDFS-site.xml

name

value

description

dfs.name.dir

c:\hdfs\nn

Determines where on the local filesystem the DFS name node should store the name table. If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.

dfs.webhdfs.enabled

false

to enable webhdfs

heartbeat.recheck.interval

300000

None

dfs.data.dir

c:\hdfs\dn

 

dfs.replication

3

Determines datanode heartbeat interval in seconds.

dfs.datanode.address

0.0.0.0:50010

 

dfs.datanode.http.address

0.0.0.0:50075

 

dfs.http.address

namenodehost:50070

The name of the default file system. Either the literal string "local" or a host:port for NDFS.

dfs.datanode.ipc.address

0.0.0.0:50020

The datanode ipc server address and port. If the port is 0 then the server will start on a free port.

dfs.permissions

false

If "true", enable permission checking in HDFS. If "false", permission checking is turned off, but all other behavior is unchanged. Switching from one parameter value to the other does not change the mode, owner or group of files or directories.

dfs.secondary.http.address

0.0.0.0:50090

Address of secondary namenode web server

dfs.secondary.https.port

0

The https port where secondary-namenode binds

dfs.https.address

localhost:50470

The https address where namenode binds

dfs.block.replicator.classname

org.apache.hadoop.hdfs.server.

namenode.AzureBlockPlacementPolicy

 

dfs.block.size

268435456

 

dfs.datanode.max.xcievers

4096

 

 

Hadoop Configuration on Data Node

HADOOP_HOME=c:\apps\dist\hadoop-1.1.0-SNAPSHOT

HADOOP_OPTS= -Dfile.encoding=UTF-8 -Dhadoop.home.dir=c:\apps\dist\hadoop-1.1.0-SNAPSHOT -Dhadoop.root.logger=INFO,console,DRFA -Dhadoop.policy.file=hadoop-policy.xml -Dhadoop.log.dir=c:\apps\dist\hadoop-1.1.0-SNAPSHOT\logs -Dhadoop.log.file=hadoop-hive-RD00155D43B7BE.log

HIVE_CONF_DIR=c:\apps\dist\hive-0.9.0\conf

HIVE_HOME=c:\apps\dist\hive-0.9.0

HIVE_LIB_DIR=c:\apps\dist\hive-0.9.0\lib

HIVE_OPTS= -hiveconf hive.querylog.location=c:\apps\dist\hive-0.9.0\logs\history -hiveconf hive.log.dir=c:\apps\dist\hive-0.9.0\logs

 

Hadoop User Log:

C:\apps\dist\hadoop-1.1.0-SNAPSHOT\logs\userlogs

MapReduce Task Tracker User Logs:

C:\hdfs\mapred\local\userlogs

Leave a Comment
  • Please add 7 and 7 and type the answer here:
  • Post
  • I m unable to see my HDInsight in management portal

    azure

  • Hi Avkash,

    I'm probably being dumb but I can't actually find these files (e.g. hdfs-site.xml). I'm looking in the BLOB Storage account, what's the location?

    Thanks in advance

    JT

  • Never mind, found it on the headnode at C:\apps\dist\hadoop-2.2.0.2.0.9.0-1686\etc\hadoop\hdfs-site.xml. Sorry!

Page 1 of 1 (3 items)