Welcome to MSDN Blogs Sign in | Join | Help

(VS2010 Beta1) Native Parallel Programming: ConcRT example -- Debugging TwentyFour

In a previous posting, we presented a simple C++ parallel program TwentyFour, which is based on the new native parallel programming API (ConcRT) introduced in Visual Studio 2010. In this posting, we're going to discuss on how understand the running of native parallel programs using Visual Studio 2010.

With Concrt programming, you divide your code logic into little units of work called chores and pass them to the Concrt runtime through scheduling calls. Once the magic Concrt runtime decides it's time to run a certain chore on a certain thread, it will calls its callback function you give through Concurrency::Chore::m_pFunction pointer. So a good place to start understand how ConcRT works is set a breakpoint inside the callback function after the input paramter is cast into a derived class object:

Setting breakpoint

Once the breakpoint hits, you can use Visual Studio normal debugging features to get a picture of what the ConcRT runtime has been doing behind our back.

First of all, the improved thread window shows that four threads have been created (on a 4-core machine):

Thread window

The four threads beside the main thread are all created by the Concrt runtime running code in the Concurrency::details name space. They're in various stages of scheduling, synchronizing, and running chores.

If you open the stack window, you will find who is really calling our CCards::Payload method:

Stack window

CCards::Payload is called by UnrealizedChore::_StructuredChoreWrapper, which finally leads to Concurrency::details::ThreadProxy::ThreadProxyMain routine in msvcr100d.dll.

If you open the variable window, and click on a few places to expand relevant objects, you will get a clearer picture of how chores are represented in ConcRT runtime:

Variables

The current CCards object we're looking at represents one branch of the first level of breaking down the problem (m_count=3). The m_expr string remembers the first operation performed(5+6=11). The CCards class derives from Concurrency::UnrealizedChore class, which in turn derives from the topmost Concurrency::Chore class.

We've set Chore::m_pFunction to point to our callback function CChards::Payload, but Concrt has another wrapper function _StructuredChoreWrapper around it to manage all structured chores. The m_fRuntimeOwnsLifetime is set to true which means the runtime will delete this object once it's done with it. The _M_pTaskCollection pointer apparently points back to the task collection this chore has been scheduled on.

With the standard debugging features of Visual Studio, we can definitely debug ConcRT-based parallel programs. We can look at different tasks running on multiple threads, or even multiple tasks running on the same thread one by one if we can remember everything in our limited mind whose short term memory is found to be only capable of holding 5 or 7 items are any more moment. But certainly it will be hard to find out anything about scheduled tasks, for they're complely hidden in the internal data structure of Concrt. With tradiaonal debugging features, literally we're only seeing individual trees without seeing the whole forest when debugging parallel programs.

This calls for new debugging features specially designed to support parallel programming, features whose implementation has unique insight into the working of parallel runtime and whose UI presentation can help us understand the totality of the parallel world and yet let us dig deeper if we choose to do so.

The first such support is the 'Parallel Tasks' window introduced in Visual Studio 2010. Here is what the parallel tasks window will show when our breakpoint is hit the first time:

Task Window when hitting the first breakpoint

The Parallel Tasks window shows, at least attempts to show, all tasks currently managed by the parallel runtime (Concrt in the native case). For each task, it shows its task identifier, status, location (top most method), a name or description of the task, thread assignment, and task group information. The Parallel Tasks window only shows task flagging status, and current/breaking task icons.

When the breakpoint is first hit, there are only two tasks, one running and another scheduled to be run. When you let the program run and check back the tasks window after a few more breakpoints, we have lot more tasks:

Tasks window after a few more breakpoint hits

The picture above shows 2 running tasks, 3 waiting tasks, and 46 scheduled tasks. The running tasks are towards the leave nodes of our search tree (m_count=1), while the waiting tasks are towards the top of the search tree (m_cont=2). The waiting tasks are actually waiting in StructureTaskCollection::Wait on the tasks scheduled on the task collection created within the task code. In this case, the task collection is created on the stack within the Solve method, which is the main body of task.

For Concrt native parallel programming, task identifiers are actually their chore object addresses. The number showing in the Task Groups column are actually addresses of task collections. Both those objects have virtual function tables, so it's possible to add them to watch window to let Visual Studio decode their real structures. The expression to use is "(IUnknown *)<addr>". Here is an example:

Watch window decoding Chore and TaskCollection 

We've seen the decoding of UnrealizedChore before, so only StructuredChoreCollection is new here. The decoded data shows this task collection still have 12 chores who still have not been running or not finished running.

If you short the Tasks window by the thread assignment column, you will find multiple tasks would be assigned to the same thread. This happens when a task finishes its own execution and waits for its child tasks in StructuredTaskCollection::Wait or other similar methods. The parallel runtime would pick another scheduled chore for execution, and this could be nested multiple times. Here is a sample stack:

Multiple tasks on the same thread
 

This one shows three tasks are on the same thread. The top-most task is really executing its own code in the Payload method, while the other two are waiting in StructuredTaskCollection::Wait method.

The second new parallel debugging window introduced in Visual Studio 2010 is the parallel stacks window, a window which helps you to understand all threads/tasks running in the parallel runtime in a single view.

The parallel stacks window has two panels which can be toggled from its toolbar. The thread panel tries to show all threads in the process in a meaningful way. It achieves this by merging common sections of stack frames into frame nodes with one or multiple frames. When thread stacks differ along the way, child frame nodes are created and linked back to parent frame nodes, thus forming a forest (multiple trees in general). Here is an example:

Parallel Stacks Window - thread panel

The thread panel shows that there are 7 threads in the system, one main thread, two threads apparently running our tasks, one ConcRT resource manager thread, one ConcRT scheduling thread, and other threads in unknown places. The last two threads are actually normal ConcRT working threads. They're just in strange places that the debugger can't decode their stacks properly.

The highlighted edges and lines show the current line of execution, from which you should be able to identify a few running and waiting tasks (looking for UnstructuredChoreWrapper here). Actually, the tasks panel of the parallel tasks window just shows that exactly, stack frames of running/waiting tasks:

Stacks window task panel

The diagram shows 5 tasks: one in CChads::Payload, one in CCards::Oper, and three in StructuredTaskCollection::Wait. You can hoover around the frame nodes to see their identifiers, stack frames. You can click on them to expand them to individual threads and frames.

There are lots of ways of interacting with both the tasks window and the stack window to help you navigate the complicated web of parallel tasks. For more details, please refer to our PM Daniel Moth's posting on his blog and videos on Channel 9.

Posted by fyuan | 2 Comments
Filed under:

(VS2010 Beta1) Native Parallel Programming: ConcRT example -- TwentyFour

WIth Microsoft Visual Studio 2010, Microsoft is releasing a new set of API for native parallel programming named ConRT (concurrent runtime). The purpose of this blog article is to illustrate the basics of ConcRT programming in VS 2010. We will be using Visual Studio Beta1 release for the discussion.

The problem we're trying to parallel here is a simple game we used to play in elementary school named 24. It goes like this, two players evenly divide up a deck of cards, and randomly show two cards at the same time. Then the players are required to find a way to compute the number 24 using the face value of each card once and only once, with four basic operations (addition, subtraction, multiplication, and division). The player who gets the right answer first gives the two cards to the other player. The game continues until they run out of time or one player runs out of cards.

So our basic programming problem here is writing a class to compute 24 using four numbers and four basic operations. The problem itself is not really computation intensive enough to warrant the usage of parallel programming unless we extend it to many more cards. But it's good enough for illustrating parallel programming in a short blog article: simple enough to understand, not too complicated, and yet not too boring.

Here goes the code:

// TwentyFour.cpp : use four numbers once and only once to compute 24 with four basic operations
//

#include <windows.h>
#include <tchar.h>
#include <stdio.h>
#include <math.h>

#include <concrt.h>

bool g_concrt = true; // true for using Concrt

bool Close(double v1, double v2)
{
    
return (abs(v1 - v2) < 0.0001);
}

                      
class CCards : public Concurrency::UnrealizedChore
{
    
char   m_expr[256];  // expression
    
int    m_count;      // card count
    
double m_cards[4];   // card value

    
void Oper(Concurrency::StructuredTaskCollection & col, int i, int j, char op, double value)
    {
        CCards * pNext 
= new CCards();

        
pNext->m_cards[0= value; // result of operation
        
pNext->m_count    1;

        for 
(int 0t < m_countt ++) // copy remaining cards
        
{
            
if ((t !i) && (t !j))
            {
                pNext->m_cards[pNext->m_count] 
m_cards[t];
                
pNext->m_count ++;
            
}
        }

        
// Append expression
        
sprintf_s(pNext->m_expr, "%s <%3.1f %c %3.1f = %3.1f> ", m_expr, m_cards[i], op, m_cards[j], value);

        if 
(g_concrt)
        {
            col.Schedule(pNext)
// schedule a task
        
}
        
else
        
{
            pNext->Solve()
;      // direct recursive solve
            
delete pNext;
        
}
    }

    CCards()
    {
        m_pFunction 
CCards::Payload// Conrt call back function
    
}

    
static void Payload(void * context) // wrapper
    
{
        CCards * pChore 
= static_cast<CCards *>((void *) ((BYTE *) context));

        
pChore->Solve();
    
}

public:

    CCards(
int a, int b, int c, int d)
    {
        m_expr[
00;

        
m_count 4;
        
        
m_cards[0a;
        
m_cards[1b;
        
m_cards[2c;
        
m_cards[3d;
    
}

    
void Solve(void)
    {
        
if (m_count == 1// leave node of searching
        
{
            
if (Close(m_cards[0], 24))
            {
                printf(
"    thread %4d %s\r\n", GetThreadId(GetCurrentThread()), m_expr);
            
}

            
return;
        
}

        Concurrency::StructuredTaskCollection col
;

        for 
(int 0i < m_counti ++)
        {
            
for (int i + 1j < m_countj ++)
            {
                Oper(col, i, j, 
'+', m_cards[i] + m_cards[j])
                
                if 
(m_cards[i] > m_cards[j])
                {
                    Oper(col, i, j, 
'-', m_cards[i] - m_cards[j]);
                
}
                
else
                
{
                    Oper(col, j, i, 
'-', m_cards[j] - m_cards[i]);
                
}

                Oper(col, i, j, 
'*', m_cards[i] * m_cards[j]);

                if 
(m_cards[j] !0)
                {
                    Oper(col, i, j, 
'/', m_cards[i] / m_cards[j]);
                
}

                
if (m_cards[i] !0)
                {
                    Oper(col, j, i, 
'/', m_cards[j] / m_cards[i]);
                
}
            }
        }

        
if (g_concrt)
        {
            col.Wait()
// wait for all child tasks
        
}
    }
}
;


int 
_tmain(int argc, _TCHAR* argv[])
{
    CCards cards(
5678);

    
g_concrt = false;

    
printf("Sequential solution:\r\n");
    
cards.Solve();

    
g_concrt = true;

    
printf("\r\n\r\n");
    
printf("Parallel solution:\r\n");
    
cards.Solve();

    return 
0;
}

The class CCards contains a list of card face values (m_count, m_cards) and a string showing the steps of computation (m_expr). The main problem solving routine is the Solve method. The loops through all combinations of card selection and operations and then call the Oper help function. The Oper function creates a new CCards object to represent the state of problem solving after an operation is performed (two less cards, one computed card, and updated expression) and then calls the Solve method on it recursively until we hace a single (computed) card left, which is then compared with the target value 24. The accumulated expression will be found is there is a match.

The code can be used in either traditional sequence programming or ConcRT based parallel programming supported by Visual Studio 2010. This is controlled by a global variable g_concrt.

When g_concrt is false, you can ignore the everything from the Concurrency name space. The Oper method will call Solve method directly on the object pointed to be pNext.

When g_concrt is true, we're really using the new native parallel programming API:

  1. The basic ConcRT API is defined in a new header file concrt.h which resides in VC include directory. The runtime support is part of the C++ runtime, so you do not need to link to any new library to use it.
  2. The ConcRT API is in the Concurrency namespace. We're using Concurrency::UnRealizedChore and Concurrency::StructuredTaskCollection here.
  3. A UnrealizedChore represents a unit of work (task/chore) that can be scheduled in a work stealing queue. If you use new operator to allocate a UnrealizedChore and then schedule it, the ConcRT runtime will be responsible to delete it.
  4. A StructuredTaskCollection represents a group of work (task/chore). You can schedule a UnrealizedChore on StructuredTaskCollection. You can wait for all scheduled tasks in a structured task collection.

When g_concrt is true, each invokation of Solve declares a new instance of StructuredTaskCollection, on which roughly N * (N -1) / 2 * 5 UnrealizedChore will be scheduled. The solve methods waits for all chores it scheduled on it before exiting. The Oper method schedules the task on the task collection, instead of calling the Solve method inline.

In the constructor of CCards class, the m_pFunction member of base class UnRealizedChore is set to point to a static function CCards::Payload. When a task/chore is ready to run as determined by the ConcRT runtime, the Payload function will be called. The parameter to the Payload function should point to the scheduled chore, as in this case CCards class object. So it casts the input parameter to CCards class and calls its Solve method for the real execution.

Here is the results you will get when run the program (using 5,6,7,8 as input):

Sequential solution:
    thread 1444  <5.0 + 7.0 = 12.0>  <12.0 - 8.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 1444  <5.0 + 7.0 = 12.0>  <8.0 - 6.0 = 2.0>  <2.0 * 12.0 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <2.0 / 6.0 = 0.3>  <8.0 / 0.3 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <6.0 / 2.0 = 3.0>  <3.0 * 8.0 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <2.0 / 8.0 = 0.3>  <6.0 / 0.3 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <8.0 / 2.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 1444  <7.0 - 5.0 = 2.0>  <6.0 * 8.0 = 48.0>  <48.0 / 2.0 = 24.0>
    thread 1444  <8.0 - 5.0 = 3.0>  <7.0 - 3.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 1444  <8.0 - 6.0 = 2.0>  <5.0 + 7.0 = 12.0>  <12.0 * 2.0 = 24.0>
    thread 1444  <6.0 * 8.0 = 48.0>  <7.0 - 5.0 = 2.0>  <48.0 / 2.0 = 24.0>
    thread 1444  <8.0 - 7.0 = 1.0>  <5.0 - 1.0 = 4.0>  <4.0 * 6.0 = 24.0>


Parallel solution:
    thread 5616  <5.0 + 7.0 = 12.0>  <8.0 - 6.0 = 2.0>  <2.0 * 12.0 = 24.0>
    thread 5616  <5.0 + 7.0 = 12.0>  <12.0 - 8.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 1444  <8.0 - 7.0 = 1.0>  <5.0 - 1.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 4692  <8.0 - 5.0 = 3.0>  <7.0 - 3.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <6.0 * 8.0 = 48.0>  <48.0 / 2.0 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <8.0 / 2.0 = 4.0>  <4.0 * 6.0 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <2.0 / 8.0 = 0.3>  <6.0 / 0.3 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <6.0 / 2.0 = 3.0>  <3.0 * 8.0 = 24.0>
    thread 4248  <7.0 - 5.0 = 2.0>  <2.0 / 6.0 = 0.3>  <8.0 / 0.3 = 24.0>
    thread 1444  <6.0 * 8.0 = 48.0>  <7.0 - 5.0 = 2.0>  <48.0 / 2.0 = 24.0>
    thread 5616  <8.0 - 6.0 = 2.0>  <5.0 + 7.0 = 12.0>  <12.0 * 2.0 = 24.0> 

The sequential running of the program solves the problem in a determistic way using a single thread. The Concrt parallelized running of the code solves the problem in a non-determistic way using multiple threads as controlled by the ConcRT runtime.

In the next article, we will discussion how to debug ConcRT programs using new features provided by Visual Studio 2010.

Posted by fyuan | 0 Comments
Filed under:

STS: A palindromic word I will remember

Continuing searching palindromic sqaure numbers finds a new record number, a 55 digit parlindromic square:

 1373512530649258635292477609^ 2 = 1886536671850530641991373196913731991460350581766356881

Quite a few records for other palindromic sequences have also been broken my multi-core searching algorithm. Check http://www.worldofnumbers.com/palrecs.htm for details.

But the focus of this posting is really about a simple palindromic acronym: STS, which stands for Science Talent Search. It is a science/math competition for high-school seniors sponsored by Intel.

STS has just announced their 40 finalists for the year of 2007 (http://www.societyforscience.org/sts/), and our son is the lucky one from Washington state. His research paper is titled: Explicit Equations in the Plane for Elliptic Curves Given as Space Quartics.

Congratulations to all the winners and thanks Intel for promoting science/math in schools.

Posted by fyuan | 1 Comments

A 53-digit Palindromic Square Number

Six years ago I wrote a program to search for palindromic square numbers and other palindromic numbers and found some new palindromic numbers. But with increasing length, the search became longer and longer so I finally gave it up.

Recently I got a new powerful machine with dual quad-core CPUs, so I brought it home during the Christmas and New Year holiday season. I modified my old search program to take advantage of multi-cores in the machine. The search program has tons of parallelism inside. Although the original code was written as incremental loops for performance reason, it is not hard to change it to run on multi-core system. The result is almost 8 fold improvement in speed, on top of CPU speed gain for the last six years.

With the multi-core version, I tried continuing the old quest again and find quite a few record breaking palindromic numbers during the holiday season. But I did not try to break the 52-digit palindromic square number record held by Pete Leadbetter since May 20, 2001 until last night. Before leaving work, I started searching for 53 digit paradromic sqaures and this morning we have a new record, a 53-digit palindromic square number:

122063831551139898460740721 ^ 2 = 14899578972945056149893218681239894165054927987599841

Other palindromic number records can be found at:

http://www.worldofnumbers.com/palrecs.htm

Go MultiCore!

Posted by fyuan | 1 Comments

XpsStat: A program for gathering statistics information of XPS documents

In the Winhec presentation on XPS document performance optimization, a simple program XpsStat is introduced to analyze XPS documents. 

Given an XPS document, XpsStat generates four or five tables in HTML format:

  • Container Summary Table: information about each type of parts in the document.
  • FixedPage / Remote Resource Dictionary Table: information about each fixed page / remote resource dictionary in the document.
  • Font Resource Table: information about each font resource in the document
  • Image Resource Table: information about each image resource in the document
  • Container Part Table (optional): information about each part in the document

The program has three command line options:

  • /x Generate report in XML format instead of the default HTML format
  • /a Generate the optional Container Part Table
  • /d Extract fonts to external files, unobfuscated.

The program is written using .Net 3.0 XPS related API and some direct parsing of ZIP container. It supports both interleaved and non-interleaved XPS files.

Here is the binary: http://blog.fengyuan.com/XpsStat.exe

Here is what the Container Summary Table the program will generate for XPS Specification 1.0 in XPS format:

Container Summary
Type Count Piece Uncompressed Compressed Compression
Size Percent Average Size Percent Average
fixedpage+xml 453 453 116,521,006 96.74 % 257,220 5,912,474 75.13 % 13,051 94.93 %
package.relationships+xml 434 434 573,558 0.48 % 1,321 164,871 2.09 % 379 71.25 %
properties+xml 1 1 673 0.00 % 673 342 0.00 % 342 49.18 %
fixeddocument+xml 1 1 289,702 0.24 % 289,702 13,756 0.17 % 13,756 95.25 %
opentype 20 20 1,647,572 1.37 % 82,378 576,144 7.32 % 28,807 65.03 %
image 492 492 1,042,993 0.87 % 2,119 992,681 12.61 % 2,017 4.82 %
fixeddocumentsequence+xml 1 1 320 0.00 % 320 160 0.00 % 160 50.00 %
documentstructure+xml 1 1 179,194 0.15 % 179,194 13,154 0.17 % 13,154 92.66 %
/Documents/1/ 1 1 0 0.00 % 0 0 0.00 % 0 NaN %
/[Content_Types].xml 1 1 1,023 0.00 % 1,023 331 0.00 % 331 67.64 %
/Documents/1/_rels/ 1 1 0 0.00 % 0 0 0.00 % 0 NaN %
/Documents/1/Pages/ 1 1 0 0.00 % 0 0 0.00 % 0 NaN %
/Resources/Images/ 1 1 0 0.00 % 0 0 0.00 % 0 NaN %
/Documents/1/Pages/_rels/ 1 1 0 0.00 % 0 0 0.00 % 0 NaN %
/Documents/1/Structure/ 1 1 0 0.00 % 0 0 0.00 % 0 NaN %
ZIP Structure 195,978 0.16 % 195,978 195,978 2.49 % 195,978 0.00 %
Summary 1,410 1,410 120,452,019 100.00 % 85,426 7,869,891 100.00 % 5,581 93.47 %

Posted by fyuan | 5 Comments
Filed under: ,

XPS Performance

There will be two XPS related sessions at Winhec 2007 tomorrow, focusing on Performance Optimization for XPS Document.

The techincal presentation (CLN-T371 ) will be at 8:30am to 9:30am in Room 502AB. Shortly after that (9:45 to 10:45), there will be a chalk talk (CLN-C372) in Room 409A on the same topic.

The focus of the talk will be on how to generate the better XPS so that XPS documents can be consumed efficiently. So we will go into great deails about how to measure XPS document for performance and ideas on how to generate better XPS. But we will cover XPS performance issues in general. So even if you're just consuming XPS documents, this presentation should still be very much relevant to you. The reason is that this presentation is written from the viewpoint of a XPS consumer.

I'm pretty sure that you will either learn something useful from the presentation and chalk talk, or you can teach the community something about XPS.

Optimization XPS document is just part of the story in the whole world of XPS performance. The rest of the story includes performance of XPS filter pipeline, the design and configuration of XPS filters, the performance of XPS rendering (either host based or in printer firmware).

Talking about performance and learning, Zoran (www.zoran.com) has an encouraging demo at Winhec showing that their host-based rendering engine is able to achive print-engine-limited printing speed (HP Color LaserJet 4700, 31 ppm), for normal office type text documents with some vector graphics and images.

One final note, I was told that people tend to need lots of coffee for morning session after evening party.

See you tomorrow.

 

Posted by fyuan | 1 Comments
Filed under: ,

XPS at Winhec 2007

Today (May 15, 2007) is the first day of Winhec 2007.

If you're application developers, printer/scanner/multi-functional device vendors, you may be interested in the following XPS related sessions.

  • CLN-T370 XPSDrv: Best Practice using Print Verifier, May 15, 3:15pm-4:15pm 515B, Manski & Ashwin
  • CLN-C369 (Chalk Talk) Print Verifier and XPSDrv Driver Development, May 15, 4:30pm-5:30pm 409A, Erhan
  • CLN-T375 HP Photo: Implementation Guidelines, May 16 11:00am-noon, 408AB, Bill Crow
  • CLN-T371 Performance Optimization for XPS Documents, May 17, 8:30am-9:30am 502AB, Feng
  • CLN-C372 (Chalk Talk) XPS Document Optimization and Best Practice, May 17 9:45am-10:45am, 409A, Rod

In the self-paced hands-lab, there is a XPSDrv related lab. 

The following exhibitors are showing their XPS related technologies:

We're here at Winhec, talk to us.

Posted by fyuan | 1 Comments
Filed under: ,

A Simple XPS Decoder in C++

If you write programs in C#, Windows Presentation Foundation in .Net 3.0 provides quite nice API to write, generate and manipulate XPS documents. You can get the same feature if you work with managed C++ in .Net 3.0. Even if you work with .Net 2.0, you can get the basic ZIP stream decoding/encoding support.

But if you want to write code in unmamanged world, especially outside of the XPS filter pipeline, there is not enough sample code there. I'm going to write a simple XPS decoder in C++, based on the zlib compression library (http://www.zlib.net/zlib123-dll.zip).

#include <windows.h>

#include <stdio.h>

#include <tchar.h>

 

#include "MemFile.h"

#include "zip.h"

#include "zlib.h"

 

#pragma comment(lib, "zdll.lib")

 

HANDLE  CreateDirFile(char * pName, DWORD len, HRESULT & hr);

 

class CXps : CMemoryMappedFile

{

    const CEndCentralDir     * m_pEndCentralDir;

    const CCentralFileHeader * m_pFileHeader;

 

public:

    HRESULT LoadXps(const wchar_t * pFileName);

    HRESULT DecodePiece(const BYTE * pCur, DWORD compressedSize);

    HRESULT ExtractAllStreams();

};

The code uses zlib.h and zdll.lib for decompression. It opens up an XPS container as a memory-mapped file, wrapped in the CXps class. The CXps class has a pointer to end central directory, and a pointer to the first central file header.

HRESULT CXps::LoadXps(const wchar_t * pFileName)

{

    m_pEndCentralDir = NULL;

    m_pFileHeader;

 

    if (! Open(pFileName))

    {

        return E_INVALIDARG;

    }

 

    const BYTE * pCur = m_View + m_nFileSize - (sizeof(CEndCentralDir) - 1);

 

    while (pCur > m_View)

    {

        DWORD signature = * (DWORD *) pCur;

 

        if (signature == sig_EndCentralDir)

        {

            m_pEndCentralDir = (const CEndCentralDir *) pCur;

            m_pFileHeader    = (const CCentralFileHeader *) (m_View + m_pEndCentralDir->m_offsetCentralDic);

 

            return S_OK;

        }

       

        pCur --;

    }

 

    return E_FAIL;

}

The LoadXps method opens up an XPS file and locates its end central directory by reading backward from the end of the ZIP file.

HRESULT CXps::ExtractAllStreams()

{

    HRESULT hr = S_OK;

 

    const CCentralFileHeader * pFile = m_pFileHeader;

 

    for (int i = 0; SUCCEEDED(hr) && (i < m_pEndCentralDir->m_entryCentralDic); i ++)

    {

        hr = DecodePiece(m_View + pFile->m_offsetLocalHeader, pFile->m_compressedSize);

 

        pFile = (const CCentralFileHeader *) (pFile->m_Data + pFile->m_nFileName + pFile->m_nExtraField + pFile->m_nFileComment);

    }

  

    return hr;

}

 

 

int XpsExtract(const wchar_t * pFileName)

{

    CXps xpsFile;

 

    HRESULT hr = xpsFile.LoadXps(pFileName);

 

    if (SUCCEEDED(hr))

    {

        hr = xpsFile.ExtractAllStreams();

    }

 

    printf("\nhr = %x\n", hr);

 

    return 0;

}

The ExtractAllStreams method tries to extract every pieces in the XPS container to an file on the disk by calling DecodePiece method. The main routine XpsExtract is responsible for calling the loading routine and the main extracting routine.

The most complicated routine here is DecodePiece:

HRESULT CXps::DecodePiece(const BYTE * pCur, DWORD compressedSize)

{

    HRESULT hr = S_OK;

 

    const CLocalFileHeader * pHeader = (const CLocalFileHeader *) pCur;

 

    char * pName = new char[pHeader->m_nFileName + 1];

 

    if (pName == NULL)

    {

        return E_OUTOFMEMORY;

    }

 

    memcpy(pName, pHeader->m_Data, pHeader->m_nFileName);

    pName[pHeader->m_nFileName] = 0;

 

    printf("\n%8d %8d %-60s", pHeader->m_uncompressedSize, compressedSize, pName);

 

    HANDLE hFile = CreateDirFile(pName, pHeader->m_nFileName, hr);

 

    delete pName;

 

    if (FAILED(hr))

    {

        printf("\nUnable to open file\n");

 

        return hr;

    }

 

    pCur = pHeader->m_Data + pHeader->m_nFileName + pHeader->m_nExtraField;

   

    if (pHeader->m_compression != 0)

    {

        BYTE * pOutput = new BYTE[pHeader->m_uncompressedSize];

 

        if (pOutput == NULL)

        {

            return E_OUTOFMEMORY;

        }

 

        z_stream stream;

 

        memset(&stream, 0, sizeof(stream));

 

        stream.avail_in  = compressedSize;

        stream.avail_out = pHeader->m_uncompressedSize;

        stream.next_in   = (Bytef *) pCur;

        stream.next_out  = pOutput;

 

        int result = inflateInit2(& stream, -MAX_WBITS);

 

        if (result == Z_OK)

        {

            result = inflate(& stream, Z_SYNC_FLUSH);

        }

 

        DWORD dwWritten;

 

        WriteFile(hFile, pOutput, pHeader->m_uncompressedSize, & dwWritten, NULL);

 

        delete [] pOutput;

    }

    else

    {

        DWORD dwWritten;

 

        WriteFile(hFile, pCur, pHeader->m_uncompressedSize, & dwWritten, NULL);

    }

 

    CloseHandle(hFile);

 

    return hr;

}

 

DecocePiece has two parameters: a pointer to local file header structure, and compressed size. Note the compressed size may not be stored in the local file header, so it's better to pass it in from the end of the container. From the header structure, we can get its name and call CreateDirFile to create a corresponding file on the disk, making sure the proper directory path is created. The ZLIB decompression routine inflate, together with its preparation routine inflateInit2 are called when the piece is actually compressed. The result uncompressed streams are then written to disk.

The structures not shown here can be easily created by following ZIP specification at http://www.pkware.com/documents/casestudies/APPNOTE.TXT. The CreateDirFile routine is a simple wrapper around CreateFile and CreateDirectory.

The program is implicitly linked with ZLIB1 ZDLL.

Posted by fyuan | 9 Comments
Filed under: ,

Another Ph.D for Bill Gates

On April 19th, Bill Gates has been awarded a honorary Ph.D degree by Tsinghua University (http://en.wikipedia.org/wiki/Tsinghua_University), the best engineering school in China.

Posted by fyuan | 2 Comments

Something useful from 'junk' mails

With a high school junior in the house, we're receiving tons of 'junk' mails from colleges these days. They usually pile up on the desk until we have time to read and discard them.

Today, my wife found something useful, or rather something we have been anxtiously waiting for, an acceptance letter from RSI (Research Science Institute) summer camp at MIT (http://www.cee.org/rsi/) for our son. He is going back to the hot Boston for the summer after attending the PROMYS math summer camp (http://www.promys.org/) at Boston University last year.

Posted by fyuan | 2 Comments

XPS document with 100,000 pages?

Once a simple document format like XPS is created, it takes on a life of its own. Seeing the beauty of XPS, people are converting documents from differenent sources to XPS. People are trying to create XPS document with 10,000 pages, and even pushing for 100,000 pages.

But if you're using Microsoft XPS Document Writer to create XPS document with more than 32,700 odd pages, you may run into a bug in the current implementation. The problem lies in the limitation of 32-bit ZIP file directory structure. The solution is to switch to 64-bit ZIP64 file directory structure.

If you're blocked by this issue, here is a simple-minded piece of code which tries to repair such XPS file.

 int ZipRepair(const wchar_t * pFileName)

{

HANDLE hFile = CreateFile(pFileName, GENERIC_READ | GENERIC_WRITE,

    FILE_SHARE_READ, NULL, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, NULL);

 

    if (hFile == INVALID_HANDLE_VALUE)

    {

        return -1;

    }

 

    DWORD sizeHigh;

    DWORD sizeLow = GetFileSize(hFile, & sizeHigh);

 

    DWORD buffer[25];

 

    DWORD read = 0;

 

    if (SetFilePointer(hFile, sizeLow - 22, NULL, SEEK_SET))

        if (ReadFile(hFile, buffer, 22, & read, NULL))

        {

            DWORD entry  = (buffer[2] & 0xFFFF) + 0x10000;

            DWORD size   = buffer[3];

            DWORD offset = buffer[4];

 

            memset(buffer, 0, sizeof(buffer));

 

            buffer[0]  = 0x06064b50;    // Zip64EndCentralDir

            buffer[1]  = 44;

            buffer[3]  = 0x002d002d;

            buffer[6]  = entry;

            buffer[8]  = entry;

            buffer[10] = size;

            buffer[12] = offset;

           

            buffer[14] = 0x07064b50;    // Zip64EndCentralDirLocator

            buffer[16] = sizeLow - 22;

            buffer[18] = 1;

 

            buffer[19] = 0x06054b50;    // EndCentralDir

            buffer[21] = 0xFFFFFFFF;

            buffer[22] = 0xFFFFFFFF;

            buffer[23] = 0xFFFFFFFF;

           

            if (SetFilePointer(hFile, sizeLow - 22, NULL, SEEK_SET))

            {

                if (WriteFile(hFile, buffer, 98, & read, NULL))

                {

                    printf("%S repaired\n", pFileName);

                }

            }

        }

 

    CloseHandle(hFile);

 

    return 0;

}

The code assumes the XPS document has between 65,536 .. 131,071 streams within it. It reads the original EndCentralDir structure at the end of the file, inserts two structures for ZIP64, and then adds a dummy EndCentralDir structure. It assumes there is no comment at the end of the document. For more informations, read the ZIP specification at http://www.pkware.com/documents/casestudies/APPNOTE.TXT

BTW, my car of 10 years just received 100,000 miles last Sunday.

Posted by fyuan | 1 Comments
Filed under:

Source code for Windows Graphics Programming: Win32 GDI and DirectDraw

Source code for Windows Graphics Programming: Win32 GDI and DirectDraw used to be on http://safariexamples.informit.com/0130869856/. But link has been broken for quite sometimes.

I've received a few emails from readers who has lost the CD of the book. If you need the source code, here is it: www.fengyuan.com/sample/Samples.zip

Posted by fyuan | 3 Comments
Filed under: ,

Convert XAML Flow Document to XPS with Style (multiple page, page size, header, margin)

XPS is a fixed document format in which pages are pre-formated to a fixed page sixe. On the opposite of the spectrum, WPF provides flow document which can be paginated dynamically in XAML viewer. To bridge the two, WPF provides features to convert a flow document to fixed document in XPS format. Actually, you can convert any WPF Visual content to XPS.

While the default conversion is very easy to use, if you want more features, you need to write some code. This acticle shows how to convert WPF flow document to multiple page XPS with page size, margin, and header.

When a flow document is loaded in WPF, it returns an IDocumentPaginatorSource object. From it, you can get its DocumentPaginator object. When you're using XpsSerializationManager to convert a loaded flow document to XPS, it accepts a DocumentPaginator object. But the little know fact is that WPF allows you to wrap a new DocumentPaginator around a DocumentPaginator to control the conversion process.

Here is our DocumentPaginator wrapper class:

public class DocumentPaginatorWrapper : DocumentPaginator

{

    Size              m_PageSize;

    Size              m_Margin;

    DocumentPaginator m_Paginator;

    Typeface          m_Typeface;

   

    public DocumentPaginatorWrapper(DocumentPaginator paginator, Size pageSize, Size margin)

    {

        m_PageSize  = pageSize;

        m_Margin    = margin;   

        m_Paginator = paginator;

 

        m_Paginator.PageSize = new Size(m_PageSize.Width  - margin.Width  * 2,

                                        m_PageSize.Height - margin.Height * 2);

    }

  

    Rect Move(Rect rect)

    {

        if (rect.IsEmpty)

        {

            return rect;

        }

        else

        {

            return new Rect(rect.Left + m_Margin.Width, rect.Top + m_Margin.Height,

                            rect.Width, rect.Height);               

        }

    }

   

    public override DocumentPage GetPage(int pageNumber)

    {

        DocumentPage page = m_Paginator.GetPage(pageNumber);

 

        // Create a wrapper visual for transformation and add extras

        ContainerVisual newpage = new ContainerVisual();

 

        DrawingVisual title = new DrawingVisual();

 

        using (DrawingContext ctx = title.RenderOpen())

        {

            if (m_Typeface == null)

            {

                m_Typeface = new Typeface("Times New Roman");

            }

 

            FormattedText text = new FormattedText("Page " + (pageNumber + 1),

                System.Globalization.CultureInfo.CurrentCulture, FlowDirection.LeftToRight,

                m_Typeface, 14, Brushes.Black);

 

            ctx.DrawText(text, new Point(0, -96 / 4)); // 1/4 inch above page content

        }

 

        DrawingVisual background = new DrawingVisual();

 

        using (DrawingContext ctx = background.RenderOpen())

        {

            ctx.DrawRectangle(new SolidColorBrush(Color.FromRgb(240, 240, 240)), null, page.ContentBox);

        }

 

        newpage.Children.Add(background); // Scale down page and center

       

            ContainerVisual smallerPage = new ContainerVisual();

            smallerPage.Children.Add(page.Visual);

            smallerPage.Transform = new MatrixTransform(0.95, 0, 0, 0.95,

                0.025 * page.ContentBox.Width, 0.025 * page.ContentBox.Height);

 

        newpage.Children.Add(smallerPage);

        newpage.Children.Add(title);

 

        newpage.Transform = new TranslateTransform(m_Margin.Width, m_Margin.Height);

 

        return new DocumentPage(newpage, m_PageSize, Move(page.BleedBox), Move(page.ContentBox));

    }

 

    public override bool IsPageCountValid

    {

        get

        {

            return m_Paginator.IsPageCountValid;

        }

    }

 

    public override int PageCount

    {

        get

        {

            return m_Paginator.PageCount;

        }

    }

 

    public override Size PageSize

    {

        get

        {

            return m_Paginator.PageSize;

        }

       

        set

        {

            m_Paginator.PageSize = value;

        }

    }

 

    public override IDocumentPaginatorSource Source

    {

        get

        {

            return m_Paginator.Source;

        }

    }

}

The DocumentPaginatorWrapper constructor modifies the page size of the original paginator based on required page size and margin. The new GetPage method calls the original GetPage method to get a page, then adds a header Visual, a background Visual (just to show the original page), and then move the combined Visual by top left margin. To demonstrate that you can easily transform a Visual, the original page Visual is scaled down and centered within its original box.

Here is the code calling the wrapper class to load a flow XAML document and convert to an XPS document:

public static int SaveAsXps(string fileName)

{

    object doc;

 

    FileInfo fileInfo = new FileInfo(fileName);

 

    using (FileStream file = fileInfo.OpenRead())

    {

        System.Windows.Markup.ParserContext context = new System.Windows.Markup.ParserContext();

        context.BaseUri = new Uri(fileInfo.FullName, UriKind.Absolute);

        doc = System.Windows.Markup.XamlReader.Load(file, context);

    }

 

    if (! (doc is IDocumentPaginatorSource))

    {

        Console.WriteLine("DocumentPaginatorSource expected");

        return -1;

    }

 

    using (Package container = Package.Open(fileName + ".xps", FileMode.Create))

    {

        using (XpsDocument xpsDoc = new XpsDocument(container, CompressionOption.Maximum))

        {

            XpsSerializationManager rsm = new XpsSerializationManager(new XpsPackagingPolicy(xpsDoc), false);

           

            DocumentPaginator paginator = ((IDocumentPaginatorSource)doc).DocumentPaginator;

 

            // 8 inch x 6 inch, with half inch margin

            paginator = new DocumentPaginatorWrapper(paginator, new Size(768, 676), new Size(48, 48));

 

            rsm.SaveAsXaml(paginator);

        }

    }

 

    Console.WriteLine("{0} generated.", fileName + ".xps");

 

    return 0;

}

The code above loads a flow XAML document and paginates it to 8 inch by 6 inch, with half inch margin.

Here is a sample output:

The gray background is added to illustrate margin, header, and original page. The original page is scaled down by 5% and centered in the gray box.

The sample XAML file is extracted from WPF Feature Montage sample (http://wpf.netfx3.com/files/folders/code_snippets/entry3752.aspx).

Posted by fyuan | 27 Comments
Filed under: ,

Anatomy of STL Vector: Data Size

In the last post, we discussed the cost of using STL vector to module size. Now let’s take a look at how STL vector manages its data. Dia2Dump (Source code available in Microsoft Visual Studio 8\Dia SDK\Samples\Dia2Dump directory) shows the following internal structure of vector<short:>

 

 

UDT       : std::vector<short,std::allocator<short> >

BaseClass :   std::_Vector_val<short,std::allocator<short> >, offset = 0x0

BaseClass :     std::_Container_base, offset = 0x0

 

Data      :     this+0x0, Member, Type: class std::allocator<short>, _Alval

 

Data      :   this+0x4, Member, Type: short *, _Myfirst

Data      :   this+0x8, Member, Type: short *, _Mylast

Data      :   this+0xc, Member, Type: short *, _Myend

 

 

Class vector<short> derives from _Vector_val<short>, which in turn derives from _Container_base. _Vector_val has one member variable, _Alval, which is of the type allocator<short>. Also allocator<short> does not have any member variable; its still occupies 1-byte space. Due to alignment, the cost is rounded up to 4-bytes. Class vector<short> itself has three member variables, all pointers to short. So the total size of a vector<short> is 16-bytes for 32-bit machines, in release build, and 32-bytes for 64-bit machines. The first member variable _Alval is never been actually accessed in release build binary code, not initialized, not referenced. For 32-bit machine, 4 extra bytes are added for debug build.

 

Class vector<T> manages its dynamic storage using three pointers, _Myfirst points to the first allocated slot, _Mylast points to the next free slot, and _Myend points after the last allocated slot. All three are initialized to null pointers. _Myfirst and _Myend change when the vector’s dynamic buffer is allocated or reallocated, their difference is the capacity of the vector. _Mylast changed when new elements are added to the vector. When it reaches _Myend, the vector is full. The difference between _Mylast and _Myfirst is the number of elements in the vector.

 

When not enough space is available in vector to add new elements, allocator<short> is used to allocate new space. Older elements are moved to the new space, empty space is filled with blank, new elements are injected into place, and then finally the old storage space is freed. To avoid large number of reallocations, STL vector tries to grow by 50% each time.

 

The following piece of code tests how STL vector grows when push_back is called repetitively:

 

 

    std::vector<int> intvector;

 

    printf("sizeof(vector<int>) %d\n\n", sizeof(intvector));

 

    size_t lastcap = 0xFFFFFFFF;

 

    const void * * p = (const void * *) & intvector;

 

    for (int i = 1; i <= 1000; i ++)

    {

        if (lastcap != intvector.capacity())

        {

            lastcap = intvector.capacity();

 

            printf("%3d %4d %4d   %p %p %p %p \n",

                i, intvector.size(), lastcap, p[0], p[1], p[2], p[3]);

        }

 

        intvector.push_back(i);

    }

 

 

The code above measures the size of vector<int> itself and how it manages its dynamic storage. It prints out one line after each time a re-allocation occurs (vector<int>::capacity() changes). Here is the output:

 

 

sizeof(vector<int>) 16

 

Seq  size() capacity()    _Alval   _Myfirst _Mylast  _Myend

 

  1    0        0         C1C1C1C1 00000000 00000000 00000000

  2    1        1         C1C1C1C1 00382FE8 00382FEC 00382FEC

  3    2        2         C1C1C1C1 00382FF8 00383000 00383000

  4    3        3         C1C1C1C1 00383008 00383014 00383014

  5    4        4         C1C1C1C1 00383020 00383030 00383030

  6    5        6         C1C1C1C1 00383038 0038304C 00383050

  8    7        9         C1C1C1C1 00383058 00383074 0038307C

 11   10       13         C1C1C1C1 00383088 003830B0 003830BC

 15   14       19         C1C1C1C1 003830C8 00383100 00383114

 21   20       28         C1C1C1C1 00384F90 00384FE0 00385000

 30   29       42         C1C1C1C1 00385008 0038507C 003850B0

 44   43       63         C1C1C1C1 003850B8 00385164 003851B4

 65   64       94         C1C1C1C1 003851C0 003852C0 00385338

 96   95      141         C1C1C1C1 00385340 003854BC 00385574

143  142      211         C1C1C1C1 00385580 003857B8 003858CC

213  212      316         C1C1C1C1 003858D8 00385C28 00385DC8

318  317      474         C1C1C1C1 00385DD0 003862C4 00386538

476  475      711         C1C1C1C1 00386540 00386CAC 0038705C

713  712     1066         C1C1C1C1 00387068 00387B88 00388110

 

 

From the output, we can see STL vector grows by 50%: 9 + 9/2 = 13, 13 + 13/2 = 19, 19 + 19/2 = 28, …  So on average, 25% more space than what’s needed could be not utilitied; while in worst case is 50%. Each time when STL vector needs to grow, it allocates the new space first before freeing the old space; realloc is not called. So the older storage is not reused even when realloc is possible. This can generate lots of heap fragmentation. To avoid these problems, application should call vector::resize or vector::reserve to change how dynamic storage is managed. If push_back is used to grow a vector one element at a time, there will be log(N)/log(1.5) times of memory re-allocation.

 

Here is a summary of STL vector’s data size cost:

                                                                                                                                             

Using STL vector<T>, if push_back is called repetitively:

 

Data Size Cost (per object) = 4 * sizeof(void *)       +                     // vector<T> object itself

                              sizeof(T) * size()       +                     // real data storage

                              sizeof(T) * size() * 25% +                     // Average reserved storage

                              log(N) / log(1.5) freed memory blocks in heap  // Discarded memory blocks

 

 

PS: The actual command to use Dia2Dump to dump type information is (notice there needs to be a space between two ‘>’s):

 

 

Dia2Dump.exe -type "std::vector<short,std::allocator<short> >" template.exe

 

Posted by fyuan | 1 Comments
Filed under:

Anatomy of STL vector: Module Size

If you need a dynamic array in C++, a widely used class is the vector template class in STL. There are even books recommending replacement of plain C++ array with STL vector.

 

This series is going to look at how STL vector is implemented and what is the cost associated with using STL vector.

 

Let’s start with an empty console program:

 

#include <stdio.h>

#include <tchar.h>

 

int _tmain(int argc, _TCHAR* argv[])

{

    int sum  = 1;

   

    return sum;

}

 

When compiled in release mode (full optimization, favor small code, multi-threaded runtime), it generates an EXE with 49,152 bytes. Here is a break down of various sections in the EXE.

 

Code                .text section          24,920 bytes

Read-only data     .rdata section           6,892 bytes

Read-write data     .data section           6,524 bytes

Resource            .rsrc section             176 bytes

 

If we add a vector of integer, the code becomes:

 

#include <stdio.h>

#include <tchar.h>

 

#include <vector>

 

int _tmain(int argc, _TCHAR* argv[])

{

    int sum = 1;

   

    std::vector<int> intvector;

 

    intvector.push_back(3);

    sum += intvector[0];

 

    intvector.push_back(5);

    sum += intvector[1];

 

    intvector.push_back(7);

    sum += intvector[2];

   

    return sum;

}

 

When the introduction of STL, we have to enable C++ exception handling in our code (/EHsc compile option is recommended). To support vector, STL also brings other classes like STL string, allocator, etc. Now the EXE is 61,400 bytes with the following breakdown:

 

Code                .text section          33,375 bytes

Read-only data     .rdata section          10,450 bytes

Read-write data     .data section           6,880 bytes

Resource            .rsrc section             176 bytes

 

STL vector is defined as a template class, so adding new instances of it may duplicate code unless compiler/linker the new instance has the same characteristics of the old ones so that the compiler/linker can merge multiple instances together. The most important factor is the size of element you put into a vector. Let’s try adding a vector of difference size, 16-bit integer.

 

#include <stdio.h>

#include <tchar.h>

 

#include <vector>

 

int _tmain(int argc, _TCHAR* argv[])

{

    int sum = 1;

   

    std::vector<int> intvector;

 

    intvector.push_back(3);

    sum += intvector[0];

 

    intvector.push_back(5);

    sum += intvector[1];

 

    intvector.push_back(7);

    sum += intvector[2];

   

    std::vector<short> shortvector;

 

    shortvector.push_back(3);

    sum += shortvector[0];

 

    shortvector.push_back(5);

    sum += shortvector[1];

 

    shortvector.push_back(7);

    sum += shortvector[2];

 

    return sum;

}

 

When you compile this third version, the EXE generated is still 61,440 bytes. But two sections within it have actually grown. It’s just that sessions are normally allocated on 4 kilo-bytes blocks, so small growth may not be visible in final EXE size. Here is the new break-down.

 

Code                .text section          34,366 bytes

Read-only data     .rdata section          10,458 bytes

Read-write data     .data section           6,880 bytes

Resource            .rsrc section             176 bytes

 

So adding vector<short> adds 991 bytes to code section and 8 bytes to read-only data section. If you dig deep in PDB file generated, you will see the last version adds the following into the code section (total 769 bytes):

 

Tag        Size   Name

 

func         86   void vector<short>::push_back(short const &)

func        125   class _Vector_iterator<short> std::vector<short>::insert

                      (class std::_Vector_iterator<short>,short const &)

func         36   short * vector<short>::_Umove<short *>(short *,short *,short *)

func        415   void vector<short>::_Insert_n(_Vector_iterator<short>,unsigned int,short const &)

func         33   short * vector<short>::_Ufill(short *,unsigned int,short const &)

func         74   short * allocator<short>::allocate(unsigned int)

 

Constructor for the STL vector is inlined here, destructor is implemented as a function. But destructors for vector<int> and vector<short> are exactly the same in binary code, so they are merged into one. Push_back is implemented with a bunch of supporting functions, while accessing the vector itself is inlined. The rest of the increment in code section is due to added test case for vector<short>.

 

The 8-byte increment in read-only section is due to change in exception handling table for std::bad_alloc, caused by adding allocator<short>::allocate.

 

The complete list of functions for STL vector is quite long. Some of them may be implemented as out-of-line functions, some of them may be inlined. Some of them may be merged with another instantiation, some may not.

 

For the record, here is the complete list of functions for STL vector<short>:

 

 

 void vector<short>(const class vector<short> &)

 void vector<short>(unsigned int, const short &, const class allocator<short> &)

 void vector<short>(unsigned int, const short &)

 void vector<short>(unsigned int)

 void vector<short>(const class allocator<short> &)

 void vector<short>()

 void _Construct_n(unsigned int, const short &)

 void ~vector<short>()

 class vector<short> & operator=(const class vector<short> &)

 void reserve(unsigned int)

 unsigned int capacity()

 class _Vector_const_iterator<short> begin()

 class _Vector_iterator<short> begin()

 class _Vector_const_iterator<short> end()

 class _Vector_iterator<short> end()

 class reverse_iterator<_Vector_const_iterator<short> > rbegin()

 class reverse_iterator<_Vector_iterator<short> > rbegin()

 class reverse_iterator<_Vector_const_iterator<short> > rend()

 class reverse_iterator<_Vector_iterator<short> > rend()

 void resize(unsigned int, short)

 void resize(unsigned int)

 unsigned int size()

 unsigned int max_size()

 bool empty()

 class allocator<short> get_allocator()

 short & at(unsigned int)

 const short & at(unsigned int)

 short & operator[](unsigned int)

 const short & operator[](unsigned int)

 const short & front()

 short & front()

 const short & back()

 short & back()

 void push_back(const short &)

 void pop_back()

 void assign(unsigned int, const short &)

 void insert(class _Vector_iterator<short>, unsigned int, const short &)

 class _Vector_iterator<short> insert(class _Vector_iterator<short>, const short &)

 class _Vector_iterator<short> erase(class _Vector_iterator<short>, class _Vector_iterator<short>)

 class _Vector_iterator<short> erase(class _Vector_iterator<short>)

 void clear()

 void swap(class vector<short> &)

 void _Assign_n(unsigned int, const short &)

 bool _Buy(unsigned int)

 void _Destroy(short *, short *)

 void _Tidy()

 void _Insert_n(class _Vector_iterator<short>, unsigned int, const short &)

 short * _Ufill(short *, unsigned int, const short &)

 static void _Xlen()

 static void _Xran()

 static void _Xinvarg()

 void * __vecDelDtor(unsigned int)

 

 

Now we can have a rough ideal of the cost of using STL vector to module size, if you only use its push_back method and [] operator:

 

Using STL vector, if you just use push_back and [] only (N: number of unique instantiations):

 

Module Size Cost = Cost of dealing with C++ exception handling +

                    769 bytes * N + 7476 bytes in            code section +

                      8 bytes * N + 3350 bytes in  read-only data section +

                                     356 bytes in read-write data section

 

Adding 11 kilobytes to get STL dymanic array support is not a big deal. But if your code is happily running without using C++ exception handling, or you're using lots of unique instantiations of STL vector template (at around 800 bytes each, if you just use push_back and []), then you should consider if the cost is too high for you.

Posted by fyuan | 3 Comments
Filed under:
More Posts Next page »
 
Page view tracker