Building Vyclone's Interactive Experience with HTML5
The Web is more engaging and productive for consumers, as developers unlock the full potential of HTML5. In this guest blog post, Anton Molleda of Plain Concepts talks about his experience and learnings while developing Vcylone, a social video editing experience built on HTML5 and many of the new features in next generation browsers like Internet Explorer 10. Vcyclone builds on capabilities like pointer events, multi-touch gestures, and hardware accelerated canvas and CSS3 to make this Web site feel more like an app.
— Rob Mauceri, Group Program Manager, Internet Explorer
Vyclone is a social video platform that lets you co-create, sync and edit multiple views of a shared moment, effortlessly.
When Vyclone first started, it solely focused on mobile devices. But soon they realized that while the recording experience is great from a phone, editing that video was limited due to the screen size and power of the device. Thanks to the progress done these last few years in modern browsers, HTML5 was a viable option as the way to go to create this new tool.
The core of Vyclone’s Web editor is composed of three parts:
The video preview: Where a low quality version of the cut the user is creating can be watched (on the left)
The vidgrid: Where all the available sources are presented to the user showing a given point and time (on the right)
And the timeline: Which indicates a linear view of which source is playing over the course of the video. A source playing during a certain amount of time is called a cut (shown above the player controls)
As the user plays the video and starts adding new cuts to the timeline, the video preview switches to reflect the new source, and the vidgrid highlights the source file with triangle corners to identify to the user which video is selected.
So in building this out, we ran into a very interesting challenge with sheer amount of video manipulation, the performance we were getting back, and the user experience. Let’s dig into what we did to make this happen on the Web. So we’re using video, canvas and requestAnimationFrame (RAF). We have a video in the background that is played, and in each RAF we draw the active source into a canvas (in the video preview) or we calculate its new size and position into the vidgrid.
So far so good, but what happens when we let the user interact with it? For example, what happens when a user moves the timeline forward and back, or adds / removes video sources (cuts)? When we first started prototyping this out, we thought the standard approach would be to take care of that as soon as the event is fired - because that's the way we've been taught, right?
But what happens when those events can be fired tens of times per second, or even hundreds of times per second? And what if those handlers need to update the UI? Do we really need to force a layout refresh 130 times per second when the delta change could sometimes be less than a pixel? Talk about performance drag!
If your machine has an i7 with 8GB of RAM, you can probably afford computing power to do that. But what about people with an older rig? Or an ARM device? Those users will not have the same experience and will see the reaction time of the Web site slow way down.
Our first approach was to queue the action in the RAF but there were some problems with this approach, such as, you can RAF up the same function for the same "tick", effectively making things worse. To solve for this, our first approach was to have a variable that will tell us if the action was already queued up. Something like this:
var queued = false;
//your awesome code here
queued = false;
queued = true;
This code is not bad but still has some problems. If you are doing something related with the event position (mouse or pointer) and a delta, you might find that you’ll struggle with this approach. The solution we used in the timeline is to accumulate the event value and process it on myAction:
var deltaX = 0,
queued = false;
//your awesome code here uses deltaX
deltaX = 0; // we reset the deltaX so it can be incremented
// next time onEvent is executed
queued = false;
queued = true;
deltaX = evt.translationX; // in the case of a pointer, if you are
// using a mouse you will have to do some
// magic with pageX or similar :)
deltaX += evt.translationX;
With this approach you should be pretty much ready to go. We kept adding functionality and then noticed some new problems popped up.
By handling those events when appropriate at each requestAnimationFrame we were able to achieve a higher level of responsiveness without sacrificing computing power. But since requestAnimationFrame executes the functions in the order, they are queued up so sometimes we were drawing before cleaning, or moving the timeline when we didn't have to and we had to create a lot of cumbersome code to make sure it got executed the order we wanted.
We saw that code wasn't very friendly and we were losing some cycles waiting for other actions to be performed so we decided to change again how we handled the input. It was at this moment that we thought about this as a game loop. If you’re not familiar in (simple) game architecture, the game loop is basically a continuous loop that gets executed regardless of the user interaction and splits apart when different events and actions should occur. From the Wikipedia article Game Programming, a simplified game loop in pseudo code could look like this:
while( user doesn't exit )
check for user input
That was exactly what we needed. Taking advantage of RAF we created a tick function that is executed continuously and inside this tick function we decide what we have to do depending on previous user input or other factors.
The simplified tick for the vidgrid is something like this:
//we clean if we've changed the size of the quadrant
// if we have to change the quadrant's frame because we are
// the active one (or the opposite)
drawFrame(); // we draw just the frame in a separate canvas so it
// doesn't need to be calculated all the time, and it
// is still faster than copying from an image
//we draw the new frame if we are playing or seeking
The values of needsClean, newFrame and dirty are updated on the event handlers (when the user is seeking, video playing, etc.).
It was this shift in the way we thought about the user interactions, going to a game loop mechanic, that allowed us to improve the performance and simplify our code in the editor.
Big takeaways, if you are building something is requires high interactivity and receives a lot of user input, think about how potentially using a game loop can make your life easier! It sure did for us. And if you haven’t had a chance to check out Vyclone’s sexy new Web editor (if I don’t say so myself), get going! Click ‘Remix’ on any video on Vyclone.com and you’ll see our Web editor. It works equally well with mouse or touch input. I highly recommend giving it a go on a Surface Pro!
Enjoy! Hit me up with some comments below if you have any questions!
— Anton Molleda, Plain Concepts