Welcome to MSDN Blogs Sign in | Join | Help

Understanding your Data

Let's say you scanned your environment and you came up with a report like this for your top 100 files.  There's a lot you can glean from this, but you really need to bucket it into groups.  Is it collaborative, is it designed for the file system, or is it archived data that really isn't something you'd want to put in your highly optimized collaboration system.  The "buckets" will help you understand your data and help you understand what should stay on file servers and what is potential for SharePoint environments. 

The data in this table comes from a scan of 50 file servers of various ages in their own lives.  The data was as old as 1987 and as new as 2004.  What you see here in this table of 7TB will help you conceptualize what is in 29 million files on a file server.  Someone who is looking at this data from a search perspective could line up the extensions with the out of box filtering and what they would get.  Would love to see someone do that work to see what types line up with what Ifilters and what simply doesn't make sense.  I think you'd fine it's more than 50% of the size here that isn't indexable or worth indexing.  Someone who looks at this from an Excel services perspective would think... wow that's a lot of Excel files!  1 Million+ files.  What's going on?  He'd find there's a lot of automation going on.  With this automation how much of it could be turned into automated realtime reports?  The Document Management person would look at this data and say... hmm how should I divide this up.  Currently it's 3000 shares with nearly a million folders.  Would I try to consolidate this into a single doc library?  Craziness right?  3000 Doc libraries, No.  3000 Site Collections?  Probably not, but we're getting closer.  There is no straight across here.  The shares will need to be taken one by one with a similar assessment of what are they doing and what of what they are doing maps across and will find value with the platform.  If I were to guess, I'd say 5% of current shares who used this platform will need to keep using it.  90% of it can be set to read only and archived.  The remaining 5% need to be moved over with a lot of hand holding, and a lot of people need training.  The 3000 share admins need site collection admin training, the 15000 folder creator design type people need advanced user training, and those who consume need the mass user training, 1 day quick start type training with brown bags, and lots of FAQs and CBTs.

Extension

Count

Total KB

Avg KB

Type

.xls        

1,348,039

889,815,861

660

collab

.ppt        

634,476

695,605,118

1,096

collab

.mdb        

40,055

517,919,309

12,930

database

.pst       

5,402

452,226,788

83,715

database

.exe        

1,003,154

443,317,354

442

code

.a          

6,448,513

395,120,258

61

code

.Doc        

1,597,277

353,493,932

221

collab

.zip        

95,400

350,289,986

3,672

archive

.dat        

389,277

285,793,707

734

code

.bak        

935,739

279,483,027

299

archive

.wmv        

10,212

192,115,586

18,813

media

.mpg        

4,466

146,872,916

32,887

media

.avi        

23,233

137,761,598

5,930

media

.dll        

609,241

131,857,838

216

code

.txt        

796,876

131,488,253

165

archive

.tif        

199,706

102,649,503

514

media

.bmp        

666,019

99,430,540

149

media

.cab        

154,217

99,184,009

643

archive

.jpg        

1,310,690

90,419,583

69

media

.pdb        

119,795

90,310,440

754

database

.asf        

12,243

84,531,461

6,904

media

.rtf        

512,300

83,360,156

163

collab

.eps        

105,302

81,575,934

775

media

.psd        

68,747

74,008,030

1,077

media

.bkf        

120

69,849,449

582,079

archive

.img        

6,302

66,741,144

10,590

archive

.log        

326,832

63,391,414

194

archive

.pch        

14,392

59,582,768

4,140

code

.pdf        

88,828

53,645,798

604

database

.b          

1,165,018

53,643,747

46

code

.lib        

167,543

49,961,933

298

code

.wav        

204,197

37,541,945

184

media

.Msi        

5,753

36,796,121

6,396

code

.dbg        

90,874

21,881,280

241

archive

.obd        

65,715

21,462,285

327

collab

.sys        

28,624

21,434,357

749

code

.htm        

3,090,583

21,205,880

7

code

.gho        

33

20,643,567

625,563

archive

.h          

1,559,539

20,619,691

13

code

.msg        

151,634

20,613,532

136

archive

.tmp        

78,542

19,654,767

250

archive

.qic        

116

19,047,170

164,200

archive

.mdf        

1,149

17,563,350

15,286

database

.wma        

7,480

17,425,905

2,330

media

.pqi        

60

17,351,949

289,199

archive

.cpp        

790,570

16,886,762

21

code

.csv        

39,737

15,347,065

386

archive

.gif       

2,694,004

14,898,135

6

media

.hlp        

76,309

13,883,602

182

archive

.chm        

27,153

13,615,917

501

archive

.c           

549,619

13,043,244

24

code

.bcp        

8,142

11,675,017

1,434

code

.mix        

19,644

11,535,236

587

media

.png        

117,184

11,462,540

98

media

.ocx        

30,898

10,132,518

328

code

.wmf        

415,077

10,057,656

24

media

.mov        

2,560

9,340,224

3,649

media

.z          

4,185

9,101,792

2,175

code

.ost        

254

8,782,743

34,578

pst

.mmf        

642

8,352,272

13,010

archive

.pub        

9,174

8,217,713

896

collab

.rpt        

44,362

7,903,887

178

database

.mpeg       

477

7,675,777

16,092

media

.DL_        

91,698

7,632,741

83

code

.iso        

297

7,324,475

24,662

collab

.blg        

280

7,258,503

25,923

code

.lsg        

625

6,902,644

11,044

code

.dcl         

5,316

6,846,094

1,288

code

.map        

37,304

6,571,169

176

media

.dir        

18,136

6,278,136

346

archive

.obj        

141,572

6,191,208

44

code

.bin        

23,215

5,936,244

256

code

.cap        

4,997

5,859,952

1,173

media

.bsc        

2,302

5,581,085

2,424

code

.ttf        

33,653

5,178,857

154

media

.sbr        

25,935

5,140,563

198

code

.al         

11,968

4,772,571

399

code

.trc        

1,007

4,680,717

4,648

archive

.idf        

14,416

4,572,195

317

media

.ldf        

782

4,205,407

5,378

database

.mvb        

495

3,803,284

7,683

media

.xlk        

3,488

3,612,669

1,036

archive

.Ex_        

31,551

3,412,071

108

code

.dot        

34,600

3,194,595

92

archive

.res         

72,394

2,419,928

33

code

.ilk        

4,913

2,324,942

473

code

.msm        

5,474

2,005,124

366

archive

.trn        

2,034

1,716,905

844

archive

.opt        

13,721

1,625,095

118

code

.out      

30,087

1,539,555

51

archive

.evt        

1,222

1,493,896

1,223

archive

.inst       

1,396

1,468,698

1,052

code

.ivt        

266

1,446,890

5,439

database

.pps        

945

1,443,456

1,527

collab

.oab        

153

1,298,709

8,488

code

.fpx        

1,267

864,848

683

media

.arc        

1,423

771,163

542

archive

.pkg        

3,668

739,020

201

code

.odb        

956

722,256

755

database

.warn       

2

661,256

330,628

code

      29,597,262

         7,278,098,260

246 KB

 

Breaking this data down into buckets, you get groupings like this...

 

(see attachment if you don't see image)

 

 Group Types Chart

 

Just because it's in the collab bucket doesn't mean that it's a good target.  Looking across a file server or whatever it is you plan to migrate from, you'll notice old files, simple junk that will never be useful.  If you have a way of aging the content and changing the culture for what's new and slowly weeding people off this system... you'll be further ahead. 

 

This post is more for you to see an example of classifying data so you can visualize it and understand "what's out there."  It also gives you an idea of sizes and file types that you'll come across.  You'll have to make some decisions around what makes sense.

 

More on File Servers:

Is the File Server Dead?

Published Tuesday, April 24, 2007 9:04 PM by joelo
Attachment(s): fileserverfiletypes.jpg

Comment Notification

If you would like to receive an email when updates are made to this post, please register here

Subscribe to this post's comments using RSS

Comments

Wednesday, April 25, 2007 4:22 PM by " + title + "

# " + title + "

Sunday, May 13, 2007 9:11 PM by Sharepoint Experiences :: Brazilian MOSS MVP

# File Shares vs. Sharepoint (RELOADED)

Hi all, Many times I follow and participate of discussions on " File Servers versus Sharepoint ", and

Leave a Comment

(required) 
required 
(required) 

  
Enter Code Here: Required
 
Page view tracker