Carl Nolan’s ramblings on development
As always here is a link to the “Generics based Framework for .Net Hadoop MapReduce Job Submission” code.
In all the samples I have shown so far I have always used the command-line consoles. However this does not need to be the case, PowerShell can be used. The Console application which is used to submit the MapReduce jobs call a .Net Submissions API. As such one can call the .Net API directly from within PowerShell; as I will now demonstrate.
The key types one needs to be concerned with are:
To use the .Net API one firstly has to create the two required objects:
$SubmitterApi = $BasePath + "\Release\MSDN.Hadoop.Submission.Api.dll" Add-Type -Path $SubmitterApi $context = New-Object -TypeName MSDN.Hadoop.Submission.Api.SubmissionContext $submitter = New-Object -TypeName MSDN.Hadoop.Submission.Api.SubmissionApi
After this one just has to define the context with the necessary job submission properties:
[string[]]$inputs = @("mobile/data") [string[]]$files = @($BasePath + "\Sample\MSDN.Hadoop.MapReduceCSharp.dll")
$config = New-Object 'Tuple[string,string]'("DictionaryCapacity", "1000") $configs = @($config)
$context.InputPaths = $inputs $context.OutputPath = "mobile/querytimes" $context.MapperType = "MSDN.Hadoop.MapReduceCSharp.MobilePhoneRangeMapper, MSDN.Hadoop.MapReduceCSharp" $context.ReducerType = "MSDN.Hadoop.MapReduceCSharp.MobilePhoneRangeReducer, MSDN.Hadoop.MapReduceCSharp" $context.Files = $files $context.ExeConfigurations = $configs
One just has to remember that the input and files specifications are defined as string arrays.
In a recent build I added support for adding user-defined key-value pairs to the application configuration file. This ExeConfigurations property expects an array of Tuple<string, String> types, hence the object definition for the $config value.
Optionally one can also set the Data and Output format types:
$context.DataFormat = [MSDN.Hadoop.Submission.Api.DataFormat]::Text $context.OutputFormat = [MSDN.Hadoop.Submission.Api.OutputFormat]::Text
However, this is not necessary if one is using the default Text values.
Once the context has been defined one just has to run the job:
$submitter.RunContext($context)
To call the PowerShell script from the Hadoop command-line once can use:
powershell -ExecutionPolicy unrestricted /File %BASEPATH%\SampleScripts\hadoopcstextrangesubmit.ps1
All in all a simple process.