Benjamin Guinebertière

This blog is about Microsoft Azure. Older stuff include architecture, SOA, BizTalk, ...

Analyzing 1 TB of IIS logs with Hadoop Map/Reduce on Azure with JavaScript | Analyse d’1 To de journaux IIS avec Hadoop Map/Reduce en JavaScript

Analyzing 1 TB of IIS logs with Hadoop Map/Reduce on Azure with JavaScript | Analyse d’1 To de journaux IIS avec Hadoop Map/Reduce en JavaScript

  • Comments 1

 

As described in a previous post, Microsoft has ported Apache Hadoop to Windows Azure (this will also be available on Windows Server). This is available as a private community technology preview for now. Comme nous l’avons vu dans un billet précédent, Microsoft a porté Apache Hadoop sur Windows Azure (ce sera aussi disponible sur Windows Server). Cela est disponible sous la forme d’une pré-version privée à l’heure actuelle.
This does not use Cygwin. One of the contributions Microsoft will propose in return to the open source community is the possibility to use JavaScript. Cela ne s’appuie pas sur Cygwin. Une des contributions que Microsoft veut proposer en retour à la communauté open source est cette possibilité d’utiliser JavaScript.
One of the goals of Hadoop is to work on large amount of unstructured data. In this sample, we’ll use JavaScript code to parse IIS logs and get information from authenticated sessions. Un des buts d’Hadoop est de travailler sur une grande quantité de données non structurées. Dans cet exemple, nous allons utiliser du code JavaScript pour analyser les jounraux IIS et récupérer des informations sur les sessions des internautes authentifiés.

 

The Internet Information Services (IIS) logs come from a Web Farm. It may be a web farm on premises or a Web Role on Windows Azure. The logs are copied and consolidated to Windows Azure blob storage. We get a little more than 1 TB of those. Here is how this looks from Windows Azure Storage Explorer: Les journaux d’Internet Information Services (IIS) viennent d’une ferme Web. Cela peut être une ferme Web à demeure ou un Web Role dans Windows Azure par exemple. Les journaux sont copiés et consolidés dans le système de stockage des blobs Windows Azure. On en a un peu plus de 1 To. Voici l’aspect que cela a dans Windows Azure Storage Explorer:

image

and from the interactive JavaScript console: et depuis la console interactive JavaScript:

image

image

1191124656300 Bytes = 1,083321564 TB

Here is how log files look like: Voici une idée de la structur des journaux IIS:

#Software: Microsoft Internet Information Services 7.5
#Version: 1.0
#Date: 2012-01-06 09:09:05
#Fields: date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken
2012-01-06 09:09:05 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /cuisine-francaise - 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) - http://site.supersimple.fr/ site.supersimple.fr 200 0 0 5734 321 3343
2012-01-06 09:09:12 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /cuisine-francaise/huitres - 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) - http://site.supersimple.fr/cuisine-francaise site.supersimple.fr 200 0 0 4922 346 890
2012-01-06 09:09:19 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /cuisine-japonaise - 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= http://site.supersimple.fr/cuisine-francaise/huitres site.supersimple.fr 200 0 0 3491 544 906
2012-01-06 09:09:22 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /cuisine-japonaise/assortiment-de-makis - 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= http://site.supersimple.fr/cuisine-japonaise site.supersimple.fr 200 0 0 3198 557 671
2012-01-06 09:09:27 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /blog - 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= http://site.supersimple.fr/cuisine-japonaise/assortiment-de-makis site.supersimple.fr 200 0 0 3972 544 2406
2012-01-06 09:09:30 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /blog/marmiton - 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= http://site.supersimple.fr/blog site.supersimple.fr 200 0 0 5214 519 718
2012-01-06 09:09:49 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /ustensiles - 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= http://site.supersimple.fr/blog/marmiton site.supersimple.fr 200 0 0 6897 525 2859
2012-01-06 09:22:13 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /Users/Account/LogOn ReturnUrl=%2Fustensiles 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= http://site.supersimple.fr/ustensiles site.supersimple.fr 200 0 0 3818 555 1203
2012-01-06 09:22:26 W3SVC1273337584 RD00155D360166 10.211.146.27 POST /Users/Account/LogOn ReturnUrl=%2Fustensiles 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8= http://site.supersimple.fr/Users/Account/LogOn?ReturnUrl=%2Fustensiles site.supersimple.fr 302 0 0 729 961 703
2012-01-06 09:22:27 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /ustensiles - 80 Test0001 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54 http://site.supersimple.fr/Users/Account/LogOn?ReturnUrl=%2Fustensiles site.supersimple.fr 200 0 0 7136 849 1249
2012-01-06 09:22:30 W3SVC1273337584 RD00155D360166 10.211.146.27 GET / - 80 Test0001 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54 http://site.supersimple.fr/ustensiles site.supersimple.fr 200 0 0 3926 788 1031
2012-01-06 09:22:57 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /cuisine-francaise - 80 Test0001 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54 http://site.supersimple.fr/ site.supersimple.fr 200 0 0 5973 795 1093
2012-01-06 09:23:00 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /cuisine-francaise/gateau-au-chocolat-et-aux-framboises - 80 Test0001 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) __RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54 http://site.supersimple.fr/cuisine-francaise site.supersimple.fr 200 0 0 8869 849 749
2012-01-06 09:30:50 W3SVC1273337584 RD00155D360166 10.211.146.27 GET / - 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - - site.supersimple.fr 200 0 0 3687 364 1281
2012-01-06 09:30:50 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /Modules/Orchard.Localization/Styles/orchard-localization-base.css - 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - http://site.supersimple.fr/ site.supersimple.fr 200 0 0 1148 422 749
2012-01-06 09:30:50 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /Themes/Classic/Styles/Site.css - 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - http://site.supersimple.fr/ site.supersimple.fr 200 0 0 15298 387 843
2012-01-06 09:30:51 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /Themes/Classic/Styles/moduleOverrides.css - 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - http://site.supersimple.fr/ site.supersimple.fr 200 0 0 557 398 1468
2012-01-06 09:30:51 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /Core/Shapes/scripts/html5.js - 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - http://site.supersimple.fr/ site.supersimple.fr 200 0 0 1804 370 1015
2012-01-06 09:30:53 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /Themes/Classic/Content/current.png - 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - http://site.supersimple.fr/ site.supersimple.fr 200 0 0 387 376 656
2012-01-06 09:30:57 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /modules/orchard.themes/Content/orchard.ico - 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - - site.supersimple.fr 200 0 0 1399 346 468
2012-01-06 09:31:54 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /Users/Account/LogOn ReturnUrl=%2F 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 - http://site.supersimple.fr/ site.supersimple.fr 200 0 0 4018 435 718
2012-01-06 09:32:14 W3SVC1273337584 RD00155D360166 10.211.146.27 POST /Users/Account/LogOn ReturnUrl=%2F 80 - 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o= http://site.supersimple.fr/Users/Account/LogOn?ReturnUrl=%2F site.supersimple.fr 302 0 0 709 1083 812
2012-01-06 09:32:14 W3SVC1273337584 RD00155D360166 10.211.146.27 GET / - 80 Test0001 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o=;+.ASPXAUTH=94C70A59F9DA0E7294DCAAAEF9A0C52FA585B56A7FC4E01AF24437C84327D3E862548C2C0A5B71DD073443F000CE5767AF9009FFDCDE5F3EE184C3D73CF4BA4C7B8650461A448467FBAB87E311209F4DFB83B19335C9002E5EC5423E145165F64F226AC7F47C19B6035025ABDEDB4A7CAB4FF63A8C22FEED3C6002E6A99920FA8249D3B9 http://site.supersimple.fr/Users/Account/LogOn?ReturnUrl=%2F site.supersimple.fr 200 0 0 3926 935 906
2012-01-06 09:33:22 W3SVC1273337584 RD00155D360166 10.211.146.27 GET / - 80 Test0001 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o=;+.ASPXAUTH=94C70A59F9DA0E7294DCAAAEF9A0C52FA585B56A7FC4E01AF24437C84327D3E862548C2C0A5B71DD073443F000CE5767AF9009FFDCDE5F3EE184C3D73CF4BA4C7B8650461A448467FBAB87E311209F4DFB83B19335C9002E5EC5423E145165F64F226AC7F47C19B6035025ABDEDB4A7CAB4FF63A8C22FEED3C6002E6A99920FA8249D3B9 http://site.supersimple.fr/Users/Account/LogOn?ReturnUrl=%2F site.supersimple.fr 200 0 0 3926 935 1156

By loading this sample in Excel, one can see that a session ID can be found from the .ASPXAUTH cookie, which is one of the cookies available as a IIS logs fields. En chargeant cela dans Excel, on peut voir qu’un ID de session peut être trouvé dans le .ASPXAUTH, un des cookies disponibles dans un champ des journaux IIS.

image

At the end of the processing, one tries to get the following result in 2 falt file structures. A la fin du traitement, on veut récupérer 2 structures de données de types fichiers plats.
Session headers give a summary of what happened in the session. Fields are a dummy row ID, sessionid, username, start date/time, end date/time, nb of visited urls. Les en-têtes de sessions fournissent un résumé de ce qui s’est passé dans la session. Les champs sont un ID de rangée sans signification, sessionid, username, date/heure de début, date/heure de fin, nb d’urls visitées.

 


134211969	19251ab2b91cb3158e21c0c74f597a9872ed257d	test2272g5x467	2012-01-28 20:06:08	2012-01-28 20:32:33	11
134213036	19251cd8a444c6642bbedc1ba5d848f26ad3c789	test1268gAx168	2012-02-02 20:01:47	2012-02-02 20:25:22	13
134213561	19252827f25750af10aaf89a9de3fc35ad15d97e	test1987g4x214	2012-01-27 01:00:46	2012-01-27 01:06:26	5
134214566	19252bb73667cc04e5de2a6eebe5e8ba7cc77c4a	test3333g4x681	2012-01-27 20:00:03	2012-01-27 20:03:23	12
134214866	19252bf03e7d962a41fde46127810339c587b0ae	test1480hFx690	2012-01-27 18:18:51	2012-01-27 18:32:51	3
134215841	19253a4d1496dfea6e264ba7839d07ebd0a9662e	test2467g6x109	2012-01-29 18:02:19	2012-01-29 18:13:10	11
134216451	19253b3c19f8a0f46fd44e6f979f3e8bedda7881	test3119hLx29	2012-02-02 18:04:17	2012-02-02 18:21:31	7
134216974	19253ff8924893dd72f6453568084e53985a8817	test2382g9x8	2012-02-01 01:07:55	2012-02-01 01:26:17	5
134217496	1925418002459ad897ed41b156f0e3eab78caa13	test3854g4x823	2012-01-27 02:06:38	2012-01-27 02:27:54	5

 

Session details give the list of URLs that were visited in a session. The fields are a dummy row ID, sessionid, hit time, url. Les détails de session donnent la liste des URLs visitées pendant une session. Les champs sont un ID de rangée sans signification, sessionid, heure du hit et l’url.

 


134216699	19253ff8924893dd72f6453568084e53985a8817	01:07:55	/Core/Shapes/scripts/html5.js
134216781	19253ff8924893dd72f6453568084e53985a8817	01:41:01	/Modules/Orchard.Localization/Styles/orchard-localization-base.css
134216900	19253ff8924893dd72f6453568084e53985a8817	01:25:02	/Users/Account/LogOff
134217072	1925418002459ad897ed41b156f0e3eab78caa13	02:08:01	/Modules/Orchard.Localization/Styles/orchard-localization-base.css
134217191	1925418002459ad897ed41b156f0e3eab78caa13	02:27:54	/Users/Account/LogOff
134217265	1925418002459ad897ed41b156f0e3eab78caa13	02:06:38	/
134217319	1925418002459ad897ed41b156f0e3eab78caa13	02:26:14	/Themes/Classic/Styles/moduleOverrides.css
134217414	1925418002459ad897ed41b156f0e3eab78caa13	02:17:08	/Core/Shapes/scripts/html5.js
134217596	1925420f22e51f948314b2a6fa0c53fe4d002455	19:11:29	/blog
134217654	1925420f22e51f948314b2a6fa0c53fe4d002455	19:00:21	/cuisine-francaise/barbecue

 

Note that the two structures could be joined thru the sessionid later on with HIVE for instance, but this is beyond the scope of this post. Also note that the sessionid is not the exact of value of the .ASPXAUTH cookie but a SHA1 hash of it so that it is shorter, in order to optimize netwrok traffic and have smaller result. On notera que les deux structures pourraient faire l’objet d’une jointure sur le champ sessionid plus tard avec HIVE par exemple, mais cela dépasse l’objet de ce billet. Il est également à noter que sessionid n’est pas la valeur exacte de ce qu’il y a dans le cookie .ASPXAUTH mais un Hash SHA1 de ce dernier de façon à ce qu’il soit plus petit et donc réduire le traffic réseau et avoir un résultat plus petit.

 

Here the code I used to do that. I may write another blog post later on to comment further that code. Voici le code que j’ai utilisé pour faire cela. J’écrirai peut-être un autre billet pour commenter un peu plus ce code.
iislogsAnalysis.js: iislogsAnalysis.js:

/*
IIS logs fields
0	date			2012-01-06
1	time 			09:09:05
2	s-sitename 		W3SVC1273337584
3	s-computername 	RD00155D360166
4	s-ip 			10.211.146.27
5	cs-method 		GET
6	cs-uri-stem 	/cuisine-francaise
7	cs-uri-query 	-
8	s-port 			80
9	cs-username 	-
10	c-ip 			94.245.127.11
11	cs-version		HTTP/1.1
12	cs(User-Agent)	Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0)
13	cs(Cookie)		- 
14	cs(Referer)		http://site.supersimple.fr/
15	cs-host			site.supersimple.fr
16	sc-status		200
17	sc-substatus	0
18	sc-win32-status	0
19	sc-bytes		5734
20	cs-bytes		321
21	time-taken		3343

sample lines
2012-01-06 09:09:05 W3SVC1273337584 RD00155D360166 10.211.146.27 GET /cuisine-francaise - 80 - 94.245.127.11 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+9.0;+Windows+NT+6.1;+WOW64;+Trident/5.0) - http://site.supersimple.fr/ site.supersimple.fr 200 0 0 5734 321 3343
2012-01-06 09:32:14 W3SVC1273337584 RD00155D360166 10.211.146.27 GET / - 80 Test0001 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o=;+.ASPXAUTH=94C70A59F9DA0E7294DCAAAEF9A0C52FA585B56A7FC4E01AF24437C84327D3E862548C2C0A5B71DD073443F000CE5767AF9009FFDCDE5F3EE184C3D73CF4BA4C7B8650461A448467FBAB87E311209F4DFB83B19335C9002E5EC5423E145165F64F226AC7F47C19B6035025ABDEDB4A7CAB4FF63A8C22FEED3C6002E6A99920FA8249D3B9 http://site.supersimple.fr/Users/Account/LogOn?ReturnUrl=%2F site.supersimple.fr 200 0 0 3926 935 906
2012-01-06 09:33:22 W3SVC1273337584 RD00155D360166 10.211.146.27 GET / - 80 Test0001 94.245.127.13 HTTP/1.1 Mozilla/5.0+(Windows+NT+6.1;+WOW64)+AppleWebKit/535.7+(KHTML,+like+Gecko)+Chrome/16.0.912.63+Safari/535.7 __RequestVerificationToken_Lw__=BpgGSfFnDr9KB5oclPotYchfIFzjWXjJ5qHrtRcXoZmLRjG8pL9fw5CtMAN3Arckjm0ZfLtUsuBUGDNRztQPPWmlGLb6tfzSmELzdYbEg5RktsGNkxBr9+eyU342Lf8wSw2YFxqiUX7X8WlXwt0DQITMg2o=;+.ASPXAUTH=94C70A59F9DA0E7294DCAAAEF9A0C52FA585B56A7FC4E01AF24437C84327D3E862548C2C0A5B71DD073443F000CE5767AF9009FFDCDE5F3EE184C3D73CF4BA4C7B8650461A448467FBAB87E311209F4DFB83B19335C9002E5EC5423E145165F64F226AC7F47C19B6035025ABDEDB4A7CAB4FF63A8C22FEED3C6002E6A99920FA8249D3B9 http://site.supersimple.fr/Users/Account/LogOn?ReturnUrl=%2F site.supersimple.fr 200 0 0 3926 935 1156
*/

/*
A cookie with authentication looks like this
__RequestVerificationToken_Lw__=KLZ1dz1Aa4o2UdwJVwr0JhzSwmmSHmID9i/gutMvQkZWX9Q4QDktFHHiBhF8mSd6Cg5oIEeUpy/KNF7VLRFkrqN28raL8PfNuv0IfuKXxgl5s+uZpcvfGE6Olfsu7uNLg2bWwLZkrqXjv9cpRGaiXelmaM8=;+.ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54
The interesting part is 
ASPXAUTH=D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54
the session ID is 
D5796612E924B60496C115914CC8F93239E99EEF4B3D6ED74BDD5C8C38D8C115D3021AB7F3B06E563EDE612BFBCBBE756803C85DECFACCA080E890C5DA6B4CA00A51792D812C93101F648505133C9E2C10779FA3E5AC19EE5E2B7E130C72C18F6309AEB736ABD06C87A7D636976A20534833E20160EC04B6B6617B378845AE627979EE54

 */

 /* the goal is to have this kind of file at the end:

fffffff0a929d9fbbbbb0b4ffa744842f9188e01	D 20:07:53 /blog
fffffff0a929d9fbbbbb0b4ffa744842f9188e01	H test2573g2x403 2012-01-25 20:07:53 2012-01-25 20:33:43 7
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:09:41 /Users/Account/LogO
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:26:12 /blog/marmiton
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:16:58 /cuisine-francaise/barbecue
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:10:00 /blog/marmiton
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:11:24 /
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:27:50 /cuisine-japonaise/assortiment-de-makis
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:29:31 /cuisine-francaise/fondue-au-fromage
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:05:19 /cuisine-japonaise
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:31:32 /cuisine-francaise/dinde
fffffff7e3dbde467fb4a004c31b41e5fdb49116	D 18:04:41 /cuisine-francaise/fondue-au-fromage
fffffff7e3dbde467fb4a004c31b41e5fdb49116	H test3698g4x509 2012-01-27 18:04:41 2012-01-27 18:31:32 10

*/


var map = function (key, value, context) {
    var f; // fields
    var i;
    var s, sessionID, sessionData;


    if (value === null || value === "") {
        return;
    }

    if (value.charAt(0) === "#") {
        return;
    }

    f = value.split(" ");
    if (f[9] === null || f[9] === "" || f[9] === "-") {
        //username is anonymous, skip the log line
        return;
    }

    s = extractSessionFromCookies(f[13]);
    if (!s) {
        return;
    }

    sessionID = Sha1.hash(s); // hash will create a shorter key, here
    generated = "M " + f[9] + " " + f[0] + " " + f[1] + " " + f[6]
    context.write(sessionID, generated);

    function extractSessionFromCookies(cookies) {
        var i, j, sessionID;

        var cookieParts = cookies.split(";");
        for (i = 0; i < cookieParts.length; i++) {
            j = cookieParts[i].indexOf("ASPXAUTH=");
            if (j >= 0) {
                sessionID = cookieParts[i].substring(j + "ASPXAUTH=".length);
                break;
            }
        }
        return sessionID;
    }
};

var reduce = function (key, values, context) {
    var generated;
    var minDate = null;
    var maxDate = null;
    var username = null;
    var currentDate, currentMinDate, currentMaxDate;
    var nbUrls = 0;
    var f;
    var currentValue;
    var firstChar;

    while (values.hasNext()) {
        currentValue = values.next();
        firstChar = currentValue.substring(0,1);

        if (firstChar == "M") {
            f = currentValue.split(" ");

            if (username === null) {
                username = f[1];
            }

            currentDate = f[2] + " " + f[3];

            if (minDate === null) {
                minDate = currentDate;
                maxDate = currentDate;
            }
            else {
                if (currentDate < minDate) {
                    minDate = currentDate;
                }
                else {
                    maxDate = currentDate;
                }
            }
            context.write(key, "D " + f[3] + " " + f[4]); // D stands for details
            nbUrls++;
        }
        else if (firstChar == "H") {
            f = currentValue.split(" ");

            if (username === null) {
                username = f[1];
            }

            currentMinDate = f[2] + " " + f[3];
            currentMaxDate = f[4] + " " + f[5];

            if (minDate === null) {
                minDate = currentMinDate;
                maxDate = currentMaxDate;
            }
            else {
                if (currentMinDate < minDate) {
                    minDate = currentMinDate;
                }
                if (currentMaxDate > maxDate) {
                    maxDate = currentMaxDate;
                }
            }
            nbUrls += parseInt(f[6]);
        }
        else if (firstChar == "D") {
            context.write(key, currentValue);
        }
        else {
            context.write(key, "X" + firstChar + " " + currentValue);
        }
    }

    generated = "H " + username + " " + minDate + " " + maxDate + " " + nbUrls.toString(); // H stands for Header
    context.write(key, generated);
}

var main = function (factory) {
    var job = factory.createJob("iisLogAnalysis", "map", "reduce");
    job.setCombiner("reduce");
    job.setNumReduceTasks(64);
    job.waitForCompletion(true);
};

//V120120c



/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */
/*  SHA-1 implementation in JavaScript | (c) Chris Veness 2002-2010 | www.movable-type.co.uk      */
/*   - see http://csrc.nist.gov/groups/ST/toolkit/secure_hashing.html                             */
/*         http://csrc.nist.gov/groups/ST/toolkit/examples.html                                   */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */

var Sha1 = {};  // Sha1 namespace

/**
* Generates SHA-1 hash of string
*
* @param {String} msg                String to be hashed
* @param {Boolean} [utf8encode=true] Encode msg as UTF-8 before generating hash
* @returns {String}                  Hash of msg as hex character string
*/
Sha1.hash = function (msg, utf8encode) {
    utf8encode = (typeof utf8encode == 'undefined') ? true : utf8encode;

    // convert string to UTF-8, as SHA only deals with byte-streams
    if (utf8encode) msg = Utf8.encode(msg);

    // constants [§4.2.1]
    var K = [0x5a827999, 0x6ed9eba1, 0x8f1bbcdc, 0xca62c1d6];

    // PREPROCESSING 

    msg += String.fromCharCode(0x80);  // add trailing '1' bit (+ 0's padding) to string [§5.1.1]

    // convert string msg into 512-bit/16-integer blocks arrays of ints [§5.2.1]
    var l = msg.length / 4 + 2;  // length (in 32-bit integers) of msg + ‘1’ + appended length
    var N = Math.ceil(l / 16);   // number of 16-integer-blocks required to hold 'l' ints
    var M = new Array(N);

    for (var i = 0; i < N; i++) {
        M[i] = new Array(16);
        for (var j = 0; j < 16; j++) {  // encode 4 chars per integer, big-endian encoding
            M[i][j] = (msg.charCodeAt(i * 64 + j * 4) << 24) | (msg.charCodeAt(i * 64 + j * 4 + 1) << 16) |
        (msg.charCodeAt(i * 64 + j * 4 + 2) << 8) | (msg.charCodeAt(i * 64 + j * 4 + 3));
        } // note running off the end of msg is ok 'cos bitwise ops on NaN return 0
    }
    // add length (in bits) into final pair of 32-bit integers (big-endian) [§5.1.1]
    // note: most significant word would be (len-1)*8 >>> 32, but since JS converts
    // bitwise-op args to 32 bits, we need to simulate this by arithmetic operators
    M[N - 1][14] = ((msg.length - 1) * 8) / Math.pow(2, 32); M[N - 1][14] = Math.floor(M[N - 1][14])
    M[N - 1][15] = ((msg.length - 1) * 8) & 0xffffffff;

    // set initial hash value [§5.3.1]
    var H0 = 0x67452301;
    var H1 = 0xefcdab89;
    var H2 = 0x98badcfe;
    var H3 = 0x10325476;
    var H4 = 0xc3d2e1f0;

    // HASH COMPUTATION [§6.1.2]

    var W = new Array(80); var a, b, c, d, e;
    for (var i = 0; i < N; i++) {

        // 1 - prepare message schedule 'W'
        for (var t = 0; t < 16; t++) W[t] = M[i][t];
        for (var t = 16; t < 80; t++) W[t] = Sha1.ROTL(W[t - 3] ^ W[t - 8] ^ W[t - 14] ^ W[t - 16], 1);

        // 2 - initialise five working variables a, b, c, d, e with previous hash value
        a = H0; b = H1; c = H2; d = H3; e = H4;

        // 3 - main loop
        for (var t = 0; t < 80; t++) {
            var s = Math.floor(t / 20); // seq for blocks of 'f' functions and 'K' constants
            var T = (Sha1.ROTL(a, 5) + Sha1.f(s, b, c, d) + e + K[s] + W[t]) & 0xffffffff;
            e = d;
            d = c;
            c = Sha1.ROTL(b, 30);
            b = a;
            a = T;
        }

        // 4 - compute the new intermediate hash value
        H0 = (H0 + a) & 0xffffffff;  // note 'addition modulo 2^32'
        H1 = (H1 + b) & 0xffffffff;
        H2 = (H2 + c) & 0xffffffff;
        H3 = (H3 + d) & 0xffffffff;
        H4 = (H4 + e) & 0xffffffff;
    }

    return Sha1.toHexStr(H0) + Sha1.toHexStr(H1) +
    Sha1.toHexStr(H2) + Sha1.toHexStr(H3) + Sha1.toHexStr(H4);
}

//
// function 'f' [§4.1.1]
//
Sha1.f = function (s, x, y, z) {
    switch (s) {
        case 0: return (x & y) ^ (~x & z);           // Ch()
        case 1: return x ^ y ^ z;                    // Parity()
        case 2: return (x & y) ^ (x & z) ^ (y & z);  // Maj()
        case 3: return x ^ y ^ z;                    // Parity()
    }
}

//
// rotate left (circular left shift) value x by n positions [§3.2.5]
//
Sha1.ROTL = function (x, n) {
    return (x << n) | (x >>> (32 - n));
}

//
// hexadecimal representation of a number 
//   (note toString(16) is implementation-dependant, and  
//   in IE returns signed numbers when used on full words)
//
Sha1.toHexStr = function (n) {
    var s = "", v;
    for (var i = 7; i >= 0; i--) { v = (n >>> (i * 4)) & 0xf; s += v.toString(16); }
    return s;
}


/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */
/*  Utf8 class: encode / decode between multi-byte Unicode characters and UTF-8 multiple          */
/*              single-byte character encoding (c) Chris Veness 2002-2010                         */
/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */

var Utf8 = {};  // Utf8 namespace

/**
* Encode multi-byte Unicode string into utf-8 multiple single-byte characters 
* (BMP / basic multilingual plane only)
*
* Chars in range U+0080 - U+07FF are encoded in 2 chars, U+0800 - U+FFFF in 3 chars
*
* @param {String} strUni Unicode string to be encoded as UTF-8
* @returns {String} encoded string
*/
Utf8.encode = function (strUni) {
    // use regular expressions & String.replace callback function for better efficiency 
    // than procedural approaches
    var strUtf = strUni.replace(
      /[\u0080-\u07ff]/g,  // U+0080 - U+07FF => 2 bytes 110yyyyy, 10zzzzzz
      function (c) {
          var cc = c.charCodeAt(0);
          return String.fromCharCode(0xc0 | cc >> 6, 0x80 | cc & 0x3f);
      }
    );
    strUtf = strUtf.replace(
      /[\u0800-\uffff]/g,  // U+0800 - U+FFFF => 3 bytes 1110xxxx, 10yyyyyy, 10zzzzzz
      function (c) {
          var cc = c.charCodeAt(0);
          return String.fromCharCode(0xe0 | cc >> 12, 0x80 | cc >> 6 & 0x3F, 0x80 | cc & 0x3f);
      }
    );
    return strUtf;
}

/**
* Decode utf-8 encoded string back into multi-byte Unicode characters
*
* @param {String} strUtf UTF-8 string to be decoded back to Unicode
* @returns {String} decoded string
*/
Utf8.decode = function (strUtf) {
    // note: decode 3-byte chars first as decoded 2-byte strings could appear to be 3-byte char!
    var strUni = strUtf.replace(
      /[\u00e0-\u00ef][\u0080-\u00bf][\u0080-\u00bf]/g,  // 3-byte chars
      function (c) {  // (note parentheses for precence)
          var cc = ((c.charCodeAt(0) & 0x0f) << 12) | ((c.charCodeAt(1) & 0x3f) << 6) | (c.charCodeAt(2) & 0x3f);
          return String.fromCharCode(cc);
      }
    );
    strUni = strUni.replace(
      /[\u00c0-\u00df][\u0080-\u00bf]/g,                 // 2-byte chars
      function (c) {  // (note parentheses for precence)
          var cc = (c.charCodeAt(0) & 0x1f) << 6 | c.charCodeAt(1) & 0x3f;
          return String.fromCharCode(cc);
      }
    );
    return strUni;
}

/* - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -  */

This code will produce an intermediary flat file structure that looks like this (headers are after details): Il produit uns structure de fichiers intermédiaires qui ressemble à cela (les en-têtes sont après les lignes de détails):

 


00000e399c3e94f8f919314762998b784d178bd4        D 02:14:32 /Core/Shapes/scripts/html5.js
00000e399c3e94f8f919314762998b784d178bd4        D 02:00:54 /Users/Account/LogOff
00000e399c3e94f8f919314762998b784d178bd4        D 02:09:39 /Modules/Orchard.Localization/Styles/orchard-localization-base.css
00000e399c3e94f8f919314762998b784d178bd4        D 02:13:24 /Themes/Classic/Styles/moduleOverrides.css
00000e399c3e94f8f919314762998b784d178bd4        D 02:12:37 /
00000e399c3e94f8f919314762998b784d178bd4        H test3059g2x50 2012-01-25 02:00:54 2012-01-25 02:12:37 5
00000e7fd498e90cf3f10b5158e1ccf6ff3b8153        D 00:26:22 /Users/Account/LogOff
00000e7fd498e90cf3f10b5158e1ccf6ff3b8153        D 00:24:12 /
00000e7fd498e90cf3f10b5158e1ccf6ff3b8153        H test0118g5x29 2012-01-28 00:24:12 2012-01-28 00:26:22 2

 

then, 2 jobs will be able to get only headers, and details. Here they are. puis, les 2 jobs suivants vont filtrer uniquement les en-têtes d’une part et les détails d’autre part.
iisLogsAnalysisToH.js iisLogsAnalysisToH.js

 


// V120120a

var map = function (key, value, context) {
    var generated;
    var minDate;
    var maxDate;
    var username;
    var nbUrls;
    var l, f;
    var firstChar;
    var sessionID;

    if (!value) {
        return;
    }

    l = value.split("\t");
    if (l.length < 2) {
        return;
    }

    sessionID = l[0];

    firstChar = l[1].substring(1, 0);
    if (firstChar != "H") {
        return;
    }

    f = l[1].split(" ");

    username = f[1];

    minDate = f[2] + " " + f[3];
    maxDate = f[4] + " " + f[5];

    nbUrls = f[6];

    generated = sessionID + "\t" + username + "\t" + minDate + "\t" + maxDate + "\t" + nbUrls;
    context.write(key, generated);
};

var main = function (factory) {
    var job = factory.createJob("iisLogAnalysisToH", "map", "");
    job.setNumReduceTasks(0);
    job.waitForCompletion(true);
};

 

and iisLogsAnalysisToD.js: et iisLogsAnalysisToD.js:

 


// V120120a

var map = function (key, value, context) {
    var generated;
    var hitTime
    var Url
    var l, f;
    var firstChar;
    var sessionID;

    if (!value) {
        return;
    }

    l = value.split("\t");
    if (l.length < 2) {
        return;
    }

    sessionID = l[0];

    firstChar = l[1].substring(1, 0);
    if (firstChar != "D") {
        return;
    }

    f = l[1].split(" ");

    hitTime = f[1];

    Url = f[2];

    generated = sessionID + "\t" + hitTime + "\t" + Url;
    context.write(key, generated);
};

var main = function (factory) {
    var job = factory.createJob("iisLogAnalysisToD", "map", "");
    job.setNumReduceTasks(0);
    job.waitForCompletion(true);
};

 

Before executing the code, one needs to provision a cluster in order to have processing power. With Windows Azure, here is how this can be done: Avant d’exécuter ce code, on doit demander la création d’un cluster pour avoir de la puissance de calcul. Avec Windows Azure, voici comment cela se passe:

image

image

image

image

In order to copy the data from blob storage to Hadoop distributed file system (HDFS), one way is to connect thru Remote Desktop to the headnode and issue a distcp command. Before that one needs to configure Windows Azure Storage (ASV) in the console. De façon à copier les données depuis le stockage des blobs Windows Azure vers le système de fichiers distribué d’Hadoop (HDFS), une possibilité est de se connecter via le bureau à distance au noeud principal du cluster et d’exécuter une command distcp. Mais avant cela, on doit configurer le stockage Windows Azure (ASV) dans la console.

image

 

image

 

image

 

image

image

distcp automatically generates a map only job that copies data from one location to another in a distributed way. This job can be tracked from the standard Hadoop console: distcp génère automatiquement un job de type map seulement qui copie les données d’un endroit à un autre de façon distribuée. Ce job peut être suivi depuis la console standard Hadoop.

image

JavaScript code must be uploaded to HDFS before being executed: On doit ensuite charger le code JavaScript dans HDFS avant de pouvoir l’exécuter:

image

image

then javascript code can be executed: puis on peut exécuter le code

image

This code runs within a few hours on a 1x8CPU+32x2CPU cluster. Ce code tourne en quelques heures sur un cluster 1x8CPU+32x2CPU.
Once it is finished, the two remaining scripts can be run in parallel (or not): Une fois que c’est fini, les deux scripts restant peuvent être exécutés en parallèle (ou pas):

image

image

Then, one gets the result in HDFS folders that can be copied back to Windows Azure blobs thru distcp, or exposed as HIVE tables and retrieved thru SSIS in SQL Server or SQL Azure thanks to the ODBC driver for HIVE. This may be explained in a future blog post. Puis, on obtient le résultat dans des dossiers HDFS qui peuvent être copiés à nouveau vers des blobs Windows Azure via distpc, ou encore être exposés sous forme de tables HIVE et récupérés via SSIS vers SQL Server ou SQL Azure grâce au pilote ODBC pour HIVE. Cela fera peut-être l’objet d’un prochain billet.
Here are just the HIVE commands to view the files as tables: Voici juste les commandes HIVE pour voir les fichiers sous forme de tables:

 


CREATE EXTERNAL TABLE iisLogsHeader (rowID STRING, sessionID STRING, username STRING, startDateTime STRING, endDateTime STRING, nbUrls INT)
ROW FORMAT DELIMITED
	FIELDS TERMINATED BY '\t'
	LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/cornac/iislogsH'

 



CREATE EXTERNAL TABLE iisLogsDetail (rowID STRING, sessionID STRING, HitTime STRING, Url STRING)
ROW FORMAT DELIMITED
	FIELDS TERMINATED BY '\t'
	LINES TERMINATED BY '\n'
STORED AS TEXTFILE
LOCATION '/user/cornac/iislogsD'

 

Smile

Benjamin

Page 1 of 1 (1 items)
Leave a Comment
  • Please add 4 and 5 and type the answer here:
  • Post