Friday, August 20, 2004


High performance sockets


In this article, I write about developing a scalable high performance network server application for Windows. Windows server 2003 is the target platform for this discussion. I recommend using Overlapped I/O, IO Control Ports and WinSock extended functions as a high performance solution.

The other choices

Before we discuss our high performance solution, let us briefly discuss some other options.

○ Use select() to wait for data and process data.
This is probably the most commonly used model. The problem with this model is, select() call requires arrays of handles for read, write and exceptions. These arrays are scanned in every select() call, their corresponding kernel structures are modified and these arrays are rewritten at the end of select(). The programmer has to rebuild this array every time a new socket is created or removed, assuming you use only a copy of your handle arrays for every select() call. You can optimize this behavior but still the overhead is high, and if you are using more than 100 sockets, you would see significant performance degradation.

○ Use WSAAsyncSelect() and a message loop
In this method, you associate every socket with a window message using WSAAsyncSelect() call and use a message loop to retrieve messages. Now you are going to face typical message loop issues, it is a thread specific single queue. If you share this message loop with non-socket Window messages you would see latency issues.

○ Use WSAEventSelect() and WaitForMultipleObjects()
Here, you associate each socket with an event kernel object. The problem here is you are limited to 64 sockets per WaitForMultipleObjects() call, so you may need to use multiple threads. You also need to rebuild the array whenever a socket is added or deleted.

Now let us get back to the topic and introduce my solution. To start with, I am going to introduce overlapped I/O and IO control ports. Let us discuss overlapped I/O first.

Overlapped operations

As with IO control ports, overlapped I/O is not socket specific. They are part of the IO subsystem in Windows. You should also remember that sockets are true file handles. Creating a socket is similar to opening a serial port device or for that matter any other device. When a socket is created, the user level socket library (winsock2 and mswsock.dll) uses AFD.sys kernel driver to ultimately open /device/Tcp (or Udp). So there is a kernel level file handle and its associated file object for every socket. Why do I talk about this? Because you need a file handle to use overlapped I/O and IO control ports.

Overlapped IO enables asynchronous execution of socket operations. Basically you tell WinSock to initiate a socket operation and call you back when the operation is completed. The socket operation can be send, receive, connect or accept. There are 3 ways you can be notified of the result: events, callbacks and completion ports. Let us do a walk through with sending data. Here is a partial function signature of WSASend().


So with the usual parameters for sending data, we have to specify a WSAOVERLAPPED structure. It is an opaque structure except for the hEvent parameter. If you want event notification, assign a handle for a manual event kernel object to this field. If you want a callback notification, write your own callback function and pass it's address as the last argument. This callback function is invoked as an user mode APC, so your thread has to wait in an alertable wait state. If you specify a callback routine, the hEvent field is ignored.

Events and callback routines are not highly scalable methods. When you use events, you are limited to 64 per WaitFor… call. So for more than 64 events, you need multiple threads and also the issues we discussed earlier. The callbacks are associated with a single thread, so they are not scalable.

IO Control Ports

IO Control Port (IOCP) is not a port at all, it is a per process message queue. Windows adds and removes IO completion messages to this queue. It is a special queue, only Windows IO manager (and its related utilities) knows how to deal with it, the message format and size are fixed and not documented. You cannot directly access this queue, because the queue structures are not documented.

IO manager can add a message to this queue whenever an IO operation (IRP) is complete. You can add your own completion message using PostQueueCompletionStatus() function. You can retrieve messages using GetQueuedCompletionStatus() function.

To create an IOCP, call CreateIOCompletionPort(), this creates the IOCP queue. To start with no file handle is associated with this queue (this is not completely true). To associate a file handle with this IOCP, call CreateIOCompletionPort() again, passing the file handle as a parameter. This is the confusing part about IOCPs, when you call CreateIOCompletionPort() again, it doesn't really create anything, it just marks a reference to the IOCP handle in the kernel file object (associated with your file handle). Now, whenever an IO operation finishes (either successful or failed) on that file handle, Windows knows which IOCP to use. There can be only one IOCP associated with a file handle, and there is no documented way to disassociate an IOCP from the file handle. When you close the file handle, it is no longer associated with that IOCP. Use CloseHandle() to delete the IOCP itself, it will be deleted after all referring file handles are closed.

IOCPs are meant to be used with thread pools. When you create an IOCP you can specify how many threads are associated with it. Say, you associated 5 threads with an IOCP, then there can be only 5 running threads that were waken from GetQeuedCompletionStuatus() function. I mention running because, if a worker thread goes ahead and enters a wait state (after being released from an IOCP wait), Windows scheduler detects it and schedules one more thread in. Don’t use this fact as a design feature, it is an insurance against accidental or unexpected blocking.

WinSock extended functions

Extended functions are Windows specific socket functions (not part of standard Unix/POSIX). They are generally high performance alternatives for their corresponding WinSock functions. Among these, I would like to talk about ConnectEx and AccpetEx. ConnectEx lets you associate an overlap structure with a connect operation, there by you don't have to wait for connect to complete, you will be notified by events or IOCPs. AcceptEx does the same for accept operation.

When you call the standard accept() function, WinSock creates a socket handle for the new connection before accept() returns, but with AcceptEx you have to create this new socket before you call AcceptEx. This enables you pre allocate socket handles when your program starts and keep reusing them. To reuse a socket handle, you should call DisconnectEx instead of closing the socket with closesocket() function. Since creating a socket is a relatively expensive operation, reusing them saves time.

Unlike standard WinSock functions, these extended functions are not directly linked against the WinSock library. You have get the address of these functions before calling them. You can use code similar to the following to achieve this. Here we are using WSAIoctl with the SIO_GET_EXTENSION_FUNCTION_PTR option to retrieve the address.

LPFN_ACCEPTEX AcceptExFunction;

INT rc = WSAIoctl (listenSocket, SIO_GET_EXTENSION_FUNCTION_POINTER, &guidAcceptEx, sizeof(guidAcceptEx), (FARPROC **)&AcceptExFunction, sizeof (FARPROC), &bytesReturned, NULL, NULL);

More on overlapped IO

Before looking at some sample code, I would like to say a few more things about overlapped IO. When you call overlapped send or receive, the data buffer is specified as an array of WSABUF structures. Each WSABUF entry specifies the address and length of a buffer, you can specify a number of them in a single call. When you use WSABUF structures Windows locks those buffers during operation and transfers data directly to and from the buffer. This eliminated need for intermediate copy.

When you initiate an overlap operation, it can complete immediately, fail immediately or left pending. If the operation failed immediately, you will not be notified through overlap notification mechanism, but you will be notified if the operation completed successfully (either immediately or later) or failed later. This is very convenient, because you can process the result of an overlap operation in a single location (i.e., at the notification point) instead of doing it both at invocation point and notification point.

And one another point, remember, Windows I/O subsystem is inherently asynchronous, Windows does extra work for you to implement synchronous behavior. So there is no penalty or cost associated with overlapped I/O.

Sample code

This is a simple echo server to demonstrate the concepts discussed earlier. Here is a brief code walk though:

○ SERVER_INFO : Per server structure
○ CONNECTION_INFO: Per TCP connection structure
○ GENERIC_OVERLAP_INFO: Describes overlap operation
○ DATA_OVERLAP_INFO: Describes overlap operation for send/receive

Function StartServer(): -- Called to start the server.
○ Allocate memory for SERVER_INFO structure
○ Open a socket for listening, using WSASocket
○ Get function pointers for AcceptEx and DisconnectEx, using WSAIoctl
○ Create IO Completion port
○ Bind the listening socket to server TCP port
○ Call listen
○ Call our CreateConnection function to create connections
○ Repeat calling CreateConnection for OVERLAP_CONNECTIONS number of connections.
○ Create a thread with ControlPortThreadEntry as the entry point.
○ Repeat above for MAX_THREADS number of threads.

Function CreateConnection():
○ Allocate memory for CONNECTION_INFO structure
○ Open a socket to be used in AcceptEX
○ Associate this socket handle with ther server's IO completion port
○ Call AcceptEx function, with the overlapped structured defined in CONNECTION_INFO structure

Function ControlPortThreadEntry(): -- Entry point for IOCP thread
○ In a forever loop, wait for a new IOCP message by calling GetQueuedCompletionStatus
○ Use the key from GetQueuedCompletionStatus as a pointer to ConnectionInfo
○ For each new message received, get the type of overlapped operation, To do this get the containing record for the overlapped operation.
○ For an ACCEPT message, call HandleAcceptExComplete function
○ For RECEIVE message, call HandleReceiveComplete function
○ For SEND message, call HandleSendComplete function
○ For TIMER message, call HandleTimeoutEvent function
○ For DISCONNECT message, call HandleDisconnectExComplete function

Function HandleAcceptExComplete(): -- Handle the completion of AcceptEx operation, a new connection has arrived
○ Start a overlap received operation using WSARecv
○ Start a timeout timer, just to protect against a long idle connection.

Function HandleReceiveComplete(): -- Data has been received, process it.
○ Send the received data by calling WSASend().
○ If the socket is closed, call EndConnection function.
○ Stop the timeout timer. (May be restarted)

Function HandleSendComplete():
○ Just start another receive operation

Function HandleDisconnectEx complete():
○ Disconnect function call has completed, go ahead start AcceptEx overlap operation.

Function EndConnection(): -- Called when a connection done
○ Call overlap DisconnectEx operation.
○ Stop the receive timeout timer if already running.

Function TimerCallback(): -- Called when receive timeout timer fires
○ Submit a message to IOCP, using PostQueedCompletionStatus, we do this to synchronize access to the thread pool.

Please email for source code.

='Brand New News From The Timber Industry!!'=

========Latest Profile==========
Energy & Asset Technology, Inc. (EGTY)
Current Price $0.15

Recognize this undiscovered gem which is poised to jump!!

Please read the following Announcement in its Entierty and
Consider the Possibilities
Watch this One to Trade!

Because, EGTY has secured the global rights to market
genetically enhanced fast growing, hard-wood trees!

EGTY trading volume is beginning to surge with landslide Announcement.
The value of this Stock appears poised for growth! This one will not
remain on the ground floor for long.

Keep Reading!!!!


-Energy and Asset Technology, Inc. (EGTY) owns a global license to market
the genetically enhanced Global Cedar growth trees, with plans to
REVOLUTIONIZE the forest-timber industry.

These newly enhanced Global Cedar trees require only 9-12 years of growth before they can
be harvested for lumber, whereas worldwide growth time for lumber is 30-50 years.

Other than growing at an astonishing rate, the Global Cedar has a number of other benefits.
Its natural elements make it resistant to termites, and the lack of oils and sap found in the wood
make it resistant to forest fire, ensuring higher returns on investments.

the wood is very lightweight and strong, lighter than Poplar and over twice
as strong as Balsa, which makes it great for construction. It also has
the unique ability to regrow itself from the stump, minimizing the land and
time to replant and develop new root systems.

Based on current resources and agreements, EGTY projects revenues of $140 Million
with an approximate profit margin of 40% for each 9-year cycle. With anticipated
growth, EGTY is expected to challenge Deltic Timber Corp. during its initial 9-year cycle.

Deltic Timber Corp. currently trades at over $38.00 a share with about $153 Million in revenues.
As the reputation and demand for the Global Cedar tree continues to grow around the world
EGTY believes additional multi-million dollar agreements will be forthcoming. The Global Cedar nursery has produced
about 100,000 infant plants and is developing a production growth target of 250,000 infant plants per month.

Energy and Asset Technology is currently in negotiations with land and business owners in New Zealand,
Greece and Malaysia regarding the purchase of their popular and profitable fast growing infant tree plants.
Inquiries from the governments of Brazil and Ecuador are also being evaluated.


The examples above show the Awesome, Earning Potential of little
known Companies That Explode onto Investor�s Radar Screens.
This stock will not be a Secret for long. Then You May Feel the Desire to Act Right
Now! And Please Watch This One Trade!!


All statements made are our express opinion only and should be treated as such.
We may own, take position and sell any securities mentioned at any time. Any statements that express or involve discussions with respect
to predictions, goals, expectations, beliefs, plans, projections, objectives, assumptions or future events or performance are
not statements of historical fact and may be "forward, looking
statements." forward, looking statements are based on expectations, estimates
and projections at the time the statements are made that involve a number of risks and uncertainties which could cause actual results
or events to differ materially from those presently anticipated. This newsletter was paid $3,000 from third party (IR Marketing).
Forward,|ooking statements in this action may be identified through the use of words such as: "projects", "foresee", "expects". in compliance with Se'ction 17. {b), we disclose the holding of EGTY shares prior to the publication of this report. Be aware of an inherent conflict of interest resulting from such holdings due to our intent to profit from the liquidation of these shares. Shares may be sold at any time, even after positive statements have been made regarding the above company. Since we own shares, there is an inherent conflict of interest in our statements and opinions. Readers of this publication are cautioned not to place undue reliance on forward,looking statements, which are based on certain assumptions and expectations involving various risks and uncertainties that could cause results to
differ materially from those set forth in the forward- looking statements. This is not solicitation to buy or sell stocks, this text is
or informational purpose only and you should seek professional advice from registered financial advisor before you do anything related with buying or selling stocks, penny stocks are very high risk and you can lose your entire investment.
I read over your blog, and i found it inquisitive, you may find My Blog interesting. My blog is just about my day to day life, as a park ranger. So please Click Here To Read My Blog
Do you want free porn? Contact my AIM SN 'abunnyinpink' just say 'give me some pics now!'.

No age verification required, totally free! Just send an instant message to AIM screen name "abunnyinpink".

Any message you send is fine!

AIM abuse can be reported here.
Get any Desired College Degree, In less then 2 weeks.

Call this number now 24 hours a day 7 days a week (413) 208-3069

Get these Degrees NOW!!!

"BA", "BSc", "MA", "MSc", "MBA", "PHD",

Get everything within 2 weeks.
100% verifiable, this is a real deal

Act now you owe it to your future.

(413) 208-3069 call now 24 hours a day, 7 days a week.
hey, I just got a free $500.00 Gift Card. you can redeem yours at Abercrombie & Fitch All you have to do to get yours is Click Here to get a $500 free gift card for your backtoschool wardrobe
hey, I just got a free $500.00 Gift Card. you can redeem yours at Abercrombie & Fitch All you have to do to get yours is Click Here to get a $500 free gift card for your backtoschool wardrobe
The matchless message, is pleasant to me :)
I confirm. So happens. We can communicate on this theme. Here or in PM.
This design is incredible! You certainly know how to keep a reader entertained.
Between your wit and your videos, I was almost moved to
start my own blog (well, almost...HaHa!) Great job.
I really enjoyed what you had to say, and
more than that, how you presented it. Too cool!

Feel free to visit my homepage ... coffee flavorings
I do accept as true with all of the ideas you have offered on your post.
They're very convincing and will certainly work. Nonetheless, the posts are too short for starters. May you please lengthen them a bit from next time? Thanks for the post.

Also visit my site: http://1980designs.Com/trendy-plus-Size-clothing/
Post a Comment

<< Home

This page is powered by Blogger. Isn't yours?