Everyone Loves Babies! Webcams and Motion Detection
| |
In the belated seventh installment of the "Some Assembly Required" column, Scott Hanselman extends, with permission, Andrew Kirillov's Motion Detector to .NET 2.0 and ClickOnce. He also works around some of the quirks in the AirLink AIC250 Network Camera by tracing network traffic and using a little intuition. |
|
Scott Hanselman
Difficulty: Intermediate
Time Required: 1-3 hours
Cost: $50-$100
|
April 7, 2006
Summary: In the belated seventh installment of the "Some Assembly Required" column, Scott Hanselman extends, with permission, Andrew Kirillov's Motion Detector to .NET 2.0 and ClickOnce. He also works around some of the quirks in the AirLink AIC250 Network Camera by tracing network traffic and using a little intuition.
The AIC250 Network Camera
Some Assembly Required has been out of circulation for a while, but I'm back. I hope you all didn't forget about me. My wife and I had a baby boy, Zenzo, just 12 weeks ago and he's been a handful. He's mostly sleeping through the night now and I'm back to work, so I'll get back into the rhythm of writing this column on a regular basic. Please do continue to send in suggestions on what kinds of gadgets I can hook up to .NET.

I wanted to be able to see Zenzo while I'm at work, so I evaluated a number of solutions. I could use Video Chat built into MSN Messenger or other chat tools, but it's kind of inconvenient to bring the baby into the computer room and I often have trouble getting the video to work through the firewall. Using video chat would also mean that both sides would have to negotiate the conversation. If I wanted to make things simpler for my wife (and increase the WAF—Wife Acceptance Factor—of the solution), I'd want to just call her at home and start the camera remotely with no effort on her part.
I wanted a camera that would have its own network connection and wouldn't require an attached computer. This means that the camera would have some kind of integrated internal Web Server.
Additionally, my parents live two hours out of town and might want to occasionally peek in on the baby, and I want to make the process as easy as possible on them. Finally, I might want to hook up some kind of motion monitor if the camera detects that the baby is thrashing in his sleep.
I looked around at a number of Network Cameras and since price was my background priority, I settled on the AIC250 Network Camera. I picked up the Wired Ethernet version for US$99, but I understand that a wireless version of the same camera can be found around the net for $99 as well.

Seting up the camera is trivial: plug it in and it gets an IP address via DHCP. It has an integrated Web Server that lets you manage the camera's settings via your browser. The way the web-based interface displays the video stream is interesting, though. It includes two options, a Java Applet or an ActiveX control. These widgets are built into the firmware of the camera and, while the camera's firmware can be updated via software direct from the manufacturer, the video can officially only be viewed in the web interface in one of these two ways.
Being the hacker-type that I am, I wanted to get access to the underlying video stream. It's great that the camera includes all the software you need built-in, but if I could get access to the stream I could do all sorts of fun stuff.
MotionJPEG
I used Simon Fell's TCPTrace to peek at the network traffic as I logged into the web interface and looked at the video using the built-in Java applet. I noticed that the traffic was a series of JPEGs, one after the other. It was an HTTP GET that never ends. The connection stayed open and spit out JPEGs as fast as I could take it, around 10-15 frames a second. I had never seen traffic like this. I'd assumed that the camera would produce AVIs or MPEG files. Then I realized that spitting out JPEGs would be the cheapest possible way to produce video because there'd be no licensing of Video Codecs (Compressor/Decompressor). A little Googling showed me that this technique was called MotionJPEG or MJPEG.
I think of MotionJPEG as a clever hack, with the emphasis on clever. It's not the MPEG (Motion Picture Experts Group) video that you may be familiar with. Instead, MotionJPEG is a stream of JPEG images, one after the other. There doesn't appear to be a formal specification, just varying implementations.
There is really no such standard as "motion JPEG" or "MJPEG" for video. Various vendors have applied JPEG to individual frames of a video sequence, and have called the renkinfgsult "M-JPEG". JPEG is designed for compressing either full-color or gray-scale images of natural, real-world scenes. [Review of Video Streaming]
MotionJPEG over HTTP uses the Content-Type header "multipart/x-mixed-replace" along with a configurable boundary. This means that the stream is made up of Multiple Parts (hence multipart) and each new frame should replace the previous frame (hence x-mixed-replace). This particular camera sends an HTTP Header like this:
Content-Type: multipart/x-mixed-replace;boundary=--video boundary--
The boundary can be anything the server chooses, in this case "--video boundary--" which is a particularly intuitive choice. After the boundary there are additional headers including the actual Content-Type of the part, in this case, image/jpeg.
I started writing a parser/renderer from scratch in .NET 2.0 and the built-in WebClient libraries. It would be fairly easy to read the data into a buffer and watch for the boundary to go by, and then use the System.Drawing libraries to read the JPEGs in and render them as came in. However, I figured that there might be a library out there that would already do the work for me. That's when I found Andrew Kirillov's CodeProject article on Motion Detection.
The MJPEG handling in Andrew's is actually secondary to his code. His .NET 1.1 project is a showcase for his motion detection algorithms that we'll talk about in a moment, but I was interested in his pluggable VideoSource implementation.
Andrew's Implementation
Andrew created an interface called IVideoSource. He includes a number of concrete implementations, one being MJPEGSourcem the one I needed.
Visual C#
public interface IVideoSource
{
event CameraEventHandler NewFrame;
string VideoSource{get; set;}
string Login{get; set;}
string Password{get; set;}
int BytesReceived{get;}
object UserData{get; set;}
bool Running{get;}
void Start();
void SignalToStop();
void WaitForStop();
void Stop();
}
The most inspired aspect of Andrew's implementation, the one I wouldn't have thought of myself, is the CameraEventHandler.
Visual C#
public delegate void CameraEventHandler(object sender, CameraEventArgs e);
public class CameraEventArgs : EventArgs
{
private System.Drawing.Bitmap bmp;
public CameraEventArgs(System.Drawing.Bitmap bmp)
{
this.bmp = bmp;
}
public System.Drawing.Bitmap Bitmap
{
get { return bmp; }
}
}
The VideoSource throws each frame it finds as bitmaps packaged within an event. This might seem inefficient at first glance, but remember that this event would likely not fire more than 30 times a second, more likely around 5-10 times a second. Additionally, he's throwing a reference to a bitmap, so it's not like copies of bitmaps are flying around the system. Using eventing in this manner not only makes for a clean interface and separation between an IVideoSource implementation and the form that chooses to render it, but it also allows Andrew to insert his post-processing filters and motion detection algorithm.
Working Around Quirks of the AIC250
I had hoped that I'd be able to plug the Motion application directly into the AIC250 camera's MJPEG source. That turned out to be wishful thinking. There weren't any problems with the MJPEG implementation in Andrew's code, but there were a few quirks in the AIC250's embedded Web Server's handling of authentication.
(Note the misspelled "Auther" (sic) HTTPHeader that tells us that Steven Wu wrote this Web Server that reports itself as Camera Web Server/1.0.)
The AIC250 Camera supports HTTP Basic Authorization. When you visit the administration home page for the first time, you're prompted for a name and password using HTTP Basic Auth, then the ActiveX control or Java Applet is on the next page. My TCP sniffing session earlier told me that the stream of JPEGs was the result of an HTTP GET to an endpoint called /mjpeg.cgi.

(click image to zoom)
I assumed that I could make a call to /mjpeg.cgi with .NET, passing in the name and password via HTTP Basic Auth and things would work great. It didn't work, though. However, if I "visited" the home page programmatically first, THEN requested /mjpeg.cgi, the MJPEG stream began. Apparently the Camera sets an internal flag after a user hits the home page that allows the next request to hit /mjpeg.cgi. I suspect this is because the /mjpeg.cgi endpoint is coded to simply return JPEGs, nothing else. They (Steven Wu?) likely decided that getting /mjpeg.cgi to support authentication and meet the HTTP spec exactly would have been a hassle and wouldn't have provided value. Their use case never suspected that some fool (me) would try to get to the video stream directly.
The next thing I ran into was that HTTP Basic Authentication is supposed to wait for an Authentication Challenge first before sending credentials. This web server seems to want the credentials sent right away. Fortunately the WebRequest class supports a PreAuthenticate property that sends the authentication credentials first, before the challenge.
I changed the MJPEGSource to support both "PreAuthentication" and "Authentication with the Home Page" options and I was off and running.
More Quirks - UnsafeHeaderParsing
That wasn't the end of the quirks I'd have to fight. After upgrading the application from .NET 1.1 to .NET 2.0, I started getting exceptions while calling HttpWebRequest.GetResponse(). However, the exceptions weren't very informative. The only thing that had changed was the upgrade from 1.1 to 2.0. A little Googling, MSDNing, and digging led me to Mike Flasko's Blog:
"By default, the .NET Framework strictly enforces RFC 2616 for URI parsing. Some server responses may include control characters in prohibited fields, which will cause the System.Net.HttpWebRequest.GetResponse method to throw a WebException. If useUnsafeHeaderParsing is set to true, System.Net.HttpWebRequest.GetResponse will not throw in this case; however, your application will be vulnerable to several forms of URI parsing attacks. The best solution is to change the server so that the response does not include control characters."
Of course, Steven Wu's built-in Web Server in this Camera isn't going to change, so I had little choice but to include an app.config like this:
<?xml version="1.0" encoding="utf-8" ?>
<configuration>
<system.net>
<settings>
<httpWebRequest useUnsafeHeaderParsing="true" />
</settings>
</system.net>
</configuration>
Motion Detection
Andrew includes four different motion detection algorithms that he describes in detail in his CodeProject Article. His algorithms are focused on image processing, rather than video processing, which is why he throws each frame as a bitmap, saving the last frame and comparing it to the current frame. Not only can he plug in different video sources, since he "normalizes" all video sources to a series of bitmaps he can apply any motion detection algorithm to any video source.


In the image above, the teddy bear on the right has just been moved with a string. The red boxes around the bear were added by the motion detection algorithm indicating what portion of the image changed.
ClickOnce
At this point I had a WinForms 2.0 application that let me view the camera in my son's room. I forwarded a port from my cable modem's external IP address to the internal IP address of the camera so I could connect to it from outside or at work. The next step was to make the application a ClickOnce application so my parents or I could run it from anywhere.
Visual Studio makes creating a ClickOnce application very straightforward. Right-click on the project within the Visual Studio Solution Explorer and select Properties. First visit the Signing tab and create a Test Certificate (unless you have your own Signing Certificate). The certificate will be used to sign the ClickOnce manifest that will be published to the web site. Next visit the Security tab and select "This is a Full Trust application." If you create your own ClickOnce application, you can specific the exact permissions that your application needs to run. Finally, from the Publish tab you can indicate the publishing location and ultimate URL. I chose to publish to a "publish" folder, then upload the contents to my website manually because I'm a control freak.

Once the project is configured, you can also build and publish a ClickOnce application by running msbuild /target:publish from a Visual Studio command line.
As an interesting sidenote, I originally blogged about problems with ClickOnce and FireFox but a member of the ClickOnce team blogged an explanation. Fortunately this will be fixed in the next version of the Framework.
Now, anyone in my family can visit our private website, ClickOnce and run this .NET 2.0 WinForms application and connect with a name and password to the AIC250 Network Camera. Andrew's motion detection is an added bonus.
Conclusion
There are a number of fun things that could be extended, added, and improved on with this project. Here are some ideas to get you started:
- Add your own video source that reads Animated GIFs, DivX or other formats.
- Build a video source implementation that connects to WIA (Windows Image Acquisition) devices.
- Hook up to other Network Cameras.
- Create your own Motion Detection algorithm that compares to frames and see how it stacks up against Andrew's.
- Make an event that fires on motion and plugins to react to that motion. Hook up lights, send emails, or whatever you like!
- Extend the application to support multiple cameras (maybe up to 4) via a "split screen" feature.
Have fun and have no fear when faced with the words: Some Assembly Required!
If you do extend this application, be sure to release the source. Thanks again to Andrew Kirillov for allowing me to extend his wonderful source. My family thanks you.
Scott Hanselman is the Chief Architect at the Corillian Corporation, an eFinance enabler. He has thirteen years experience developing software in C, C++, VB, COM, and most recently in VB.NET and C#. Scott is proud to be both a Microsoft RD and Architecture MVP. He is co-author of Professional ASP.NET 2.0 with Bill Evjen, available on BookPool.com and Amazon. His thoughts on the Zen of .NET, Programming and Web Services can be found on his blog at http://www.computerzen.com/.
In the belated seventh installment of the "Some Assembly Required" column, Scott Hanselman extends, with permission, Andrew Kirillov's Motion Detector to .NET 2.0 and ClickOnce. He also works around some of the quirks in the AirLink AIC250 Network Camera by tracing network traffic and using a little intuition. Scott Hanselman Corillian Corporation http://www.corillian.com http://download.microsoft.com/download/4/1/e/41e8f2c1-1bf7-419f-b31b-06122d090a49/BabycamAndMotionDetection_CS.msi Intermediate Visual Studio C# Express Editions]]> ]]>