We’ve made a nice fix to the Templeton job submission service that runs on the HDInsight clusters for remote job submission.  We’ve talked with a number of customers who want to be able to get access to the logs for the jobs remotely as well.  This typically requires access directly to the cluster.  We’ve updated Templeton to support dropping the job logs directly into ASV as part of the status directory.

The way to do this is to pass “enablelogs” as a query string parameter set to true.  Here’s the what the request looks like:

image

Upon job completion, the logs will be moved into the status directory, under a logs folder with the following structure:

$log_root/list.xml (summary of jobs)
$log_root/stderr (frontend stderr)
$log_root/stdout (frontend stdout)
$log_root/$job_id (directory home for a job)
$log_root/$job_id/job.xml.html
$log_root/$job_id/$attempt_id (directory home for a attempt)
$log_root/$job_id/$attempt_id/stderr
$log_root/$job_id/$attempt_id/stdout
$log_root/$job_id/$attempt_id/syslog

Here’s a screen shot from Storage Studio that shows the folder structure:

image

If you look in the syslog file here, you’ll see a bunch of goodness about your job execution.  For more complex jobs that spin off multiple map/reduce jobs (eg, Pig, hive, Cascading), you will see the set of jobs recorded there. The root directory will also contain a list.txt with the details of all the jobs, and each jbo will contain a jobs.xml.html which contains all of the details, environment variables, and configuration of the job.  All of this information is useful in debugging and tuning your jobs.

We will be updating the SDK to support this parameter in the next release, but for now, you can submit jobs directly to the cluster and add this parameter to get job logs.

Here’s the text from my execution of a simple Hive query:

2013-07-11 02:48:56,632 INFO org.apache.hadoop.util.NativeCodeLoader: Loaded the native-hadoop library
2013-07-11 02:48:56,726 INFO org.apache.hadoop.mapred.TaskRunner: Creating symlink: c:/hdfs/mapred/local/taskTracker/distcache/1562199005048822745_1328179698_1268869633/namenodehost/hive/scratch/hive_2013-07-11_02-48-38_757_5240085401495207012/-mr-10003/34a2bf31-e18b-440d-8d91-be8d0e445d2e <- c:\hdfs\mapred\local\taskTracker\admin\jobcache\job_201307110233_0003\attempt_201307110233_0003_m_000000_0\work\HIVE_PLAN34a2bf31-e18b-440d-8d91-be8d0e445d2e
2013-07-11 02:48:56,992 WARN org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Source name ugi already exists!
2013-07-11 02:48:57,289 INFO org.apache.hadoop.mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.WindowsResourceCalculatorPlugin@a4f5b6d
2013-07-11 02:48:57,726 WARN org.apache.hadoop.hive.conf.HiveConf: hive-site.xml not found on CLASSPATH
2013-07-11 02:48:57,961 WARN org.apache.hadoop.io.compress.snappy.LoadSnappy: Snappy native library not loaded
2013-07-11 02:48:58,101 INFO org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader: Processing file asv://mwinkletemp37@mwinkle.blob.core.windows.net/hive/warehouse/hivesampletable/HiveSampleData.txt
2013-07-11 02:48:58,101 INFO org.apache.hadoop.mapred.MapTask: numReduceTasks: 0
2013-07-11 02:48:58,117 INFO ExecMapper: maximum memory = 954466304
2013-07-11 02:48:58,117 INFO ExecMapper: conf classpath = [file:/C:/hdfs/mapred/local/taskTracker/admin/jobcache/job_201307110233_0003/attempt_201307110233_0003_m_000000_0/classpath-5670276484193870096.jar]
2013-07-11 02:48:58,117 INFO ExecMapper: thread classpath = [file:/C:/hdfs/mapred/local/taskTracker/admin/jobcache/job_201307110233_0003/attempt_201307110233_0003_m_000000_0/classpath-5670276484193870096.jar]
2013-07-11 02:48:58,164 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Adding alias hivesampletable to work list for file asv://mwinkletemp37@mwinkle.blob.core.windows.net/hive/warehouse/hivesampletable
2013-07-11 02:48:58,164 INFO org.apache.hadoop.hive.ql.exec.MapOperator: dump TS struct<clientid:string,querytime:string,market:string,deviceplatform:string,devicemake:string,devicemodel:string,state:string,country:string,querydwelltime:double,sessionid:bigint,sessionpagevieworder:bigint>
2013-07-11 02:48:58,164 INFO ExecMapper:
< MAP>Id =4
  <Children>
    <TS>Id =3
       <Children>
        <FIL>Id =2
          <Children>
             <SEL>Id =1
              <Children>
                <FS>Id =0
                  <Parent>Id = 1 null<\Parent>
                < \FS>
              <\Children>
              <Parent>Id = 2 null<\Parent>
            <\SEL>
          <\Children>
          <Parent>Id = 3 null<\Parent>
        <\FIL>
      <\Children>
      <Parent>Id = 4 null<\Parent>
    <\TS>
  <\Children>
< \MAP>
2013-07-11 02:48:58,164 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initializing Self 4 MAP
2013-07-11 02:48:58,164 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing Self 3 TS
2013-07-11 02:48:58,164 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Operator 3 TS initialized
2013-07-11 02:48:58,164 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initializing children of 3 TS
2013-07-11 02:48:58,164 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: Initializing child 2 FIL
2013-07-11 02:48:58,164 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: Initializing Self 2 FIL
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: Operator 2 FIL initialized
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: Initializing children of 2 FIL
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing child 1 SEL
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing Self 1 SEL
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: SELECT struct<clientid:string,querytime:string,market:string,deviceplatform:string,devicemake:string,devicemodel:string,state:string,country:string,querydwelltime:double,sessionid:bigint,sessionpagevieworder:bigint>
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Operator 1 SEL initialized
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initializing children of 1 SEL
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing child 0 FS
2013-07-11 02:48:58,179 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initializing Self 0 FS
2013-07-11 02:48:58,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Operator 0 FS initialized
2013-07-11 02:48:58,195 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Initialization Done 0 FS
2013-07-11 02:48:58,195 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: Initialization Done 1 SEL
2013-07-11 02:48:58,195 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: Initialization Done 2 FIL
2013-07-11 02:48:58,195 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: Initialization Done 3 TS
2013-07-11 02:48:58,195 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Initialization Done 4 MAP
2013-07-11 02:48:58,601 INFO org.apache.hadoop.hive.ql.exec.MapOperator: Processing alias hivesampletable for file asv://mwinkletemp37@mwinkle.blob.core.windows.net/hive/warehouse/hivesampletable
2013-07-11 02:48:58,601 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarding 1 rows
2013-07-11 02:48:58,601 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarding 1 rows
2013-07-11 02:48:58,617 INFO ExecMapper: ExecMapper: processing 1 rows: used memory = 87647896
2013-07-11 02:48:58,617 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarding 10 rows
2013-07-11 02:48:58,617 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarding 10 rows
2013-07-11 02:48:58,617 INFO ExecMapper: ExecMapper: processing 10 rows: used memory = 87647896
2013-07-11 02:48:58,617 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarding 100 rows
2013-07-11 02:48:58,617 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarding 100 rows
2013-07-11 02:48:58,617 INFO ExecMapper: ExecMapper: processing 100 rows: used memory = 87647896
2013-07-11 02:48:58,632 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarding 1000 rows
2013-07-11 02:48:58,632 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarding 1000 rows
2013-07-11 02:48:58,632 INFO ExecMapper: ExecMapper: processing 1000 rows: used memory = 87647896
2013-07-11 02:48:58,804 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarding 10000 rows
2013-07-11 02:48:58,804 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarding 10000 rows
2013-07-11 02:48:58,804 INFO ExecMapper: ExecMapper: processing 10000 rows: used memory = 87647896
2013-07-11 02:48:59,211 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 finished. closing...
2013-07-11 02:48:59,211 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 forwarded 59793 rows
2013-07-11 02:48:59,211 INFO org.apache.hadoop.hive.ql.exec.MapOperator: DESERIALIZE_ERRORS:0
2013-07-11 02:48:59,211 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 finished. closing...
2013-07-11 02:48:59,211 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 forwarded 59793 rows
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 finished. closing...
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 forwarded 0 rows
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: PASSED:0
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: FILTERED:59793
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 finished. closing...
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 forwarded 0 rows
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 0 finished. closing...
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: 0 forwarded 0 rows
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Final Path: FS hdfs://namenodehost:9000/hive/scratch/hive_2013-07-11_02-48-38_757_5240085401495207012/_tmp.-ext-10001/000000_0
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: Writing to temp file: FS hdfs://namenodehost:9000/hive/scratch/hive_2013-07-11_02-48-38_757_5240085401495207012/_task_tmp.-ext-10001/_tmp.000000_0
2013-07-11 02:48:59,226 INFO org.apache.hadoop.hive.ql.exec.FileSinkOperator: New Final Path: FS hdfs://namenodehost:9000/hive/scratch/hive_2013-07-11_02-48-38_757_5240085401495207012/_tmp.-ext-10001/000000_0
2013-07-11 02:48:59,336 INFO org.apache.hadoop.hive.ql.exec.SelectOperator: 1 Close done
2013-07-11 02:48:59,336 INFO org.apache.hadoop.hive.ql.exec.FilterOperator: 2 Close done
2013-07-11 02:48:59,336 INFO org.apache.hadoop.hive.ql.exec.TableScanOperator: 3 Close done
2013-07-11 02:48:59,336 INFO org.apache.hadoop.hive.ql.exec.MapOperator: 4 Close done
2013-07-11 02:48:59,336 INFO ExecMapper: ExecMapper: processed 59793 rows: used memory = 96526440
2013-07-11 02:48:59,336 INFO org.apache.hadoop.mapred.Task: Task:attempt_201307110233_0003_m_000000_0 is done. And is in the process of commiting
2013-07-11 02:48:59,382 INFO org.apache.hadoop.mapred.Task: Task 'attempt_201307110233_0003_m_000000_0' done.
2013-07-11 02:48:59,414 INFO org.apache.hadoop.mapred.TaskLogsTruncater: Initializing logs' truncater with mapRetainSize=-1 and reduceRetainSize=-1
2013-07-11 02:48:59,476 INFO org.apache.hadoop.io.nativeio.NativeIO: Initialized cache for UID to User mapping with a cache timeout of 14400 seconds.