LucidWorks Search on Windows Azure delivers a high-performance search service based on Apache Lucene/Solr open source indexing and search technology. This service enables quick and easy provisioning of Lucene/Solr search functionality on Windows Azure without any need to manage and operate Lucene/Solr servers, and it supports pre-built connectors for various types of enterprise data, structured data, unstructured data and web sites.
In June, we shared an overview of the LucidWorks Search service for Windows Azure. For this post, the first in a series, we’ll cover a few of the concepts you need to know to get the most out of the LucidWorks search service on Windows Azure. In future posts we’ll show you how to set up a LucidWorks service on Windows Azure and demonstrate how to integrate search with Web sites, unstructured data and structured data.
Options for Developers
Developers can add search to their existing Web Sites, or create a new Windows Azure Web site with search as a central function. For example, in future posts in this series, we’ll create a simple Windows Azure web site that will use the LucidWorks search service to index and search the contents of other Web sites. Then we’ll enable search from the same demo Web site against a set of unstructured data and MySQL structured data in other locations.
Overview: Documents, Fields, and Collections
LucidWorks creates an index of unstructured and structured data. Any individual item that is indexed and/or searched is called a Document. Documents can be a row in a structured data source or a file in an unstructured data source, or anything else that Solr/Lucene understands.
An individual item in a Document is called a Field. Same concept – fields can be columns of data in a structured source or a word in an unstructured source, or anything in between. Fields are generally atomic, in other words they cannot be broken down into smaller items.
LucidWorks calls groups of Documents that can be managed and searched independently of each other Collections. Searching, by default is on one collection at a time, but of course programmatically a developer can create search functionality that returns results for more than one Collection.
Security via Collections and Filters
Collections are a great way to restrict a group of users, controlled by access to Windows Azure Web sites and by LucidWorks. In addition, LucidWorks Admins can create Filters inside a Collection. User identity can be integrated with an existing LDAP directory, or managed programmatically via API.
LucidWorks additional Features
LucidWorks adds value to Solr/Lucene with some very useful UI enhancements that can be enabled without programming.
Persistent Queries and Alerts, Auto-complete, spellcheck and similar terms.
Users can create their own persistent queries. Search terms are automatically monitored and Alerts are delivered to a specified email address using the Name of the alert as the subject line. You can also specify how often the persistent query should check for new data and how often alerts are generated.
Search term Typeahead can be enabled via LucidWorks’ auto-complete functionality. Auto-complete tracks the characters the user has already entered and displays terms that start with those characters.
When results re displayed, LucidWorks can spell-check queries and offer alternative terms based on similar spellings of words and synonyms in the query.
Search engines use Stopwords to remove common words from queries and query indexes like “a”, “and”, or “for” that add no value to searches. LucidWorks has an editable list of Stopwords that is a great start to increase search relevance.
Increasing Relevance with Click Scoring
Click scoring tracks common queries and query results and tracks which results are most often selected against query terms and scores relevance based on the comparison results. Results with a higher relevance are placed higher in search result rankings, based on user activity.
LucidWorks on Windows Azure – Easy Deployment
The best part of LucidWorks is how easily Enterprise Search can be added as a service. In our next LucidWorks blog post we’ll cover how to quickly get up and running with Enterprise search by adding a LucidWorks service to an existing Windows Azure Web site.
From: Erik Meijer, Architect, Microsoft Corp. Claudio Caldato, Lead Program Manager, Microsoft Open Technologies, Inc.
Updated with a quote from Ferranti Computer Systems NV
Updated: added quotes from Netflix and BlueMountain Capital Management
If you are a developer that writes asynchronous code for composite applications in the cloud, you know what we are talking about, for everybody else Rx Extensions is a set of libraries that makes asynchronous programming a lot easier. As Dave Sexton describes it, “If asynchronous spaghetti code were a disease, Rx is the cure.”
Reactive Extensions (Rx) is a programming model that allows developers to glue together asynchronous data streams. This is particularly useful in cloud programming because helps create a common interface for writing applications that come from diverse data sources, e.g., stock quotes, Tweets, computer events, Web service requests.
Today, Microsoft Open Technologies, Inc., is open sourcing Rx. Its source code is now hosted on CodePlex to increase the community of developers seeking a more consistent interface to program against, and one that works across several development languages. The goal is to expand the number of frameworks and applications that use Rx in order to achieve better interoperability across devices and the cloud.
Rx was developed by Microsoft Corp. architect Erik Meijer and his team, and is currently used on products in various divisions at Microsoft. Microsoft decided to transfer the project to MS Open Tech in order to capitalize on MS Open Tech’s best practices with open development.
There are applications that you probably touch every day that are using Rx under the hood. A great example is GitHub for Windows.
According to Paul Betts at GitHub, "GitHub for Windows uses the Reactive Extensions for almost everything it does, including network requests, UI events, managing child processes (git.exe). Using Rx and ReactiveUI, we've written a fast, nearly 100% asynchronous, responsive application, while still having 100% deterministic, reliable unit tests. The desktop developers at GitHub loved Rx so much, that the Mac team created their own version of Rx and ReactiveUI, called ReactiveCocoa, and are now using it on the Mac to obtain similar benefits."
And Scott Weinstein with Lab49 adds, “Rx has proved to be a key technology in many of our projects. Providing a universal data access interface makes it possible to use the same LINQ compositional transforms over all data whether it’s UI based mouse movements, historical trade data, or streaming market data send over a web socket. And time based LINQ operators, with an abstracted notion of time make it quite easy to code and unit test complex logic.”
And Howard Mansell, Quantitative Strategist with BlueMountain Capital Management added, “We are very pleased that Microsoft are Open-Sourcing the Reactive Extensions for .NET. This will allow users to better reason about performance and optimize their particular use cases, which is critical for performance and latency sensitive applications such as real-time financial analysis.”
From Belgium, Guido Van de Velde, Director MECOMS Product Organisation for Ferranti Computer Systems NV, explains how Rx is important for their global company: “Ferranti uses Rx in its vertical solution for the utility market, MECOMS™, to process and manage all data and events from the Smart Grid. Its architecture allows the setup of data processing pipelines which can scale and deliver excellent performance. Performance testing together with Microsoft showed that this architectures supports up to hundreds of millions of smart meters and other sensors, running on commodity hardware. Thanks to Rx we can focus on component functionalities and don’t have to worry about interfaces and connections between the different components saving significant development time.”
Part of the Rx development team will be on assignment with the MS Open Tech Hub engineering program to accelerate the open development of the Rx project and to collaborate with open source communities. Erik will continue to drive the strategic directions of the technology and leverage MS Open Tech Hub engineering resources to update and improve the Rx libraries. With the community contribution we want to see Rx be adopted by other platforms. Our goal is to build an open ecosystem of Rx-compliant libraries that will help developers tackle the complexity of asynchronous programming and improve interoperability.
We are also happy to see that our decision is welcome by open source developers.
“Open sourcing Rx just makes sense. My hope is that we’ll see a couple of virtuous side-effects of this decision. Most likely will be faster releases for bug fixes and performance improvements, but the ability to understand the inner workings of the Rx code should encourage the creation of additional tools and Rx providers to remote data sources,” said Lab 49’s Scott Weinstein.
According to Dave Sexton, http://davesexton.com/blog, “It’s a solid library built around core principles that hides much of the complexity of controlling and coordinating asynchrony within any kind of application. Opening it will help to lower the learning curve and increase the adoption rate of this amazing library, enabling developers to create complex asynchronous queries with relative ease and without any spaghetti code left over.”
Starting today, the following libraries are available on CodePlex:
We look forward to seeing you guys use the library, share your thoughts and contribute to the evolution of this fantastic technology built for all you developers.
We are very excited to share the news that AMQP 1.0 was approved as an OASIS Standard on October 29, 2012.
AMQP 1.0 libraries are available for a variety of languages and platforms. The interest amongst users is growing. Support for AMQP 1.0 is anticipated in various message-oriented middleware implementations. AMQP 1.0 is the protocol of choice for open and interoperable messaging from the client all the way to the cloud!
AMQP 1.0 as an open, interoperable, wire level messaging protocol enables interoperability between compliant clients and brokers. Applications can achieve full-fidelity message exchange between components built using different languages and frameworks and running on different operating systems. Further, as an inherently efficient application layer binary protocol, AMQP 1.0 enables new possibilities in messaging that scale from the client to the cloud.
IIT Software GmbH, INETCO Systems Ltd., Microsoft, Red Hat and StormMQ have publicly posted statements about their use of AMQP 1.0 to the OASIS AMQP Technical Committee.
Several AMQP 1.0 client libraries are currently available:
1. AMQP 1.0 JMS library for Java from Apache Qpid
2. AMQP 1.0 library for Java from SwiftMQ (IIT Software GmbH)
3. Proton AMQP 1.0 library for C (including PHP and Python bindings) from Apache Qpid (Linux only today)
Windows Azure Toolkit for Eclipse, November 2012 Preview (version 1.8.0) now includes a new component “Package for Apache Qpid Client Libraries for JMS (by MS Open Tech)” which makes it easier for Java developers who use Eclipse to develop Java applications that use AMQP 1.0 for messaging.
Stay tuned for more information as more libraries and implementations become available!
Thanks, Ram Jeyaraman (Co-chair of OASIS AMQP Technical Committee and Senior Program Manager, Microsoft Open Technologies, Inc., a subsidiary of Microsoft Corporation) Doug Mahugh (Senior Technical Evangelist, Microsoft Open Technologies, Inc., a subsidiary of Microsoft Corporation)
AMQP Member Section Site: http://www.amqp.org
OASIS AMQP Technical Committee: http://www.oasis-open.org/committees/amqp
The activities of IETF’s HTTPbis working group continue next week at IETF 85 in Atlanta, marking another step in the path of HTTP/2.0 to Proposed Standard. Fruitful discussions are happening on many facets of the specification, filling the gaps wherever no obvious consensus had yet emerged or the initial draft did not clearly specify a given behavior that will be essential for a working, interoperable implementation.
In an earlier blog post, we called out seven specific areas where the group will need to do additional work. Gabriel Montenegro and Willy Tarreau have now submitted a new proposal which describes a suggested approach for Negotiation in HTTP/2.0, in order to move the discussion forward on one of those key subjects. As it is, the proposal can already be used to negotiate HTTP 2.0 either in the clear or over TLS. Naturally, this proposal is a starting point and will undergo revisions going forward based on working group discussions (e.g., to further optimize the handshake).
As outlined in the proposal itself, the mechanism is very simple. It leverages the Upgrade header defined in HTTP/1.1 and already in use in WebSocket. A client who is uncertain about whether the server supports HTTP/2.0 will initiate a request using HTTP/1.1 and include an upgrade header:
GET /default.htm HTTP/1.1
GET /default.htm HTTP/1.1
At this point, if the server supports HTTP/1.1 only, it will just ignore the upgrade request and respond normally for an HTTP/1.1 connection:
HTTP/1.1 200 OK
HTTP/1.1 200 OK
If instead the server does support HTTP/2.0, it will upgrade the connection and send the first HTTP/2.0 frame, with the important benefit of achieving that without any additional roundtrips.
HTTP/1.1 101 Switching Protocols
[ HTTP/2.0 frame ]
We have implemented this behavior and updated the prototype which we originally released back in May. Please download the latest version, check it out and let us know what you think: we look forward to hearing your feedback. And stay tuned for additional, completely redesigned prototypes coming soon!
Adalberto Foresti Principal Program Manager Microsoft Open Technologies, Inc. A subsidiary of Microsoft Corporation
I attended IBM’s Information on Demand conference two weeks ago, where I had the opportunity to talk to people about OData (Open Data Protocol). Microsoft and IBM are collaborating on the development of the OData standard in the OASIS OData Technical Committee, and for this conference we were demonstrating a simple OData feed on a DB2 database, consumed by a variety of client applications.
Here’s a high-level view of the architecture of the demo app:
For this demo, we deployed an OData service on Windows Azure that exposes a few entities from a DB2 database running on IBM’s cloud platform. By leveraging WCF Data Services in Visual Studio, we were able to create this OData feed in a matter of minutes.
Here’s a screencast that shows the steps involved in creating the demo service and consuming it from various client devices and applications:
For more information about using OData with DB2 or Informix, see “Use OData with IBM DB2 and Informix” on the IBM DeveloperWorks site.
The growing OData ecosystem is enabling a variety of new scenarios to deliver open data for the open web, and it was great to have the opportunity to learn from so many perspectives this week! Standardizing a URI query syntax and semantics means that data providers and data consumers can focus on innovative ways to add value by combining disparate data sources, and assures interoperability between a wide variety of data producers and consumers. To learn more about OData, including the ecosystem, developer tools, and how you can get involved, see this blog post.
Special thanks to Susan Malaika, Brent Gross, and John Gera of IBM for all of their help with putting together the demo and their support at the booth throughout the conference. We’re looking forward to continued collaboration with our colleagues at IBM and the many other organizations involved in the ongoing standardization of OData!
I’m pleased to announce the availability of a major update to our Eclipse tooling, the “Windows Azure Toolkit for Eclipse, November 2012 Preview (version 1.8.0)”. This release accompanies the release of the Windows Azure SDK v1.8, as well as the AMQP 1.0 messaging protocol support in Windows Azure Service Bus, and exposes a number of related features recently enabled by Windows Azure.
The key highlights of this release include:
a) The updated “Windows Azure Plugin for Eclipse with Java” supports using Windows Server 2012 as the target operating system in the cloud
b) The plugin also now allows you to easily configure Windows Azure Caching, so you can use a memcached-compatible client for co-located, in-memory caching scenarios
c) The toolkit includes a new component: “Package for Apache Qpid Client Libraries for JMS (by MS Open Tech)”, which is a distribution of the latest client libraries from Apache supporting AMQP 1.0-based messaging recently enabled by Windows Azure Service Bus
d) Plus a number of additional customer-feedback driven enhancements and bug fixes
To learn more, see our latest documentation.
Martin Sawicki Principal Program Manager Microsoft Open Technologies, Inc. A subsidiary of Microsoft Corporation
Gabriel Montenegro Principal Software Development Engineer, Microsoft Corporation
Brian Raymor Senior Program Manager, Microsoft Open Technologies, Inc.
Rob Trace Senior Program Manager Lead, Microsoft Corporation
We just returned from the IETF 85 meeting in Atlanta, where the HTTPbis working group held face to face meetings to begin work on HTTP/2.0. As outlined in our previous IETF 84 report, there are seven key technical areas where consensus has not yet emerged or the initial draft did not specify clear behavior for an interoperable implementation. The IETF 85 meeting focused on three of these areas:
Discussion on server push was deferred until more data is available.
As noted in the HTTPbis charter, the working group needs to explicitly consider:
A negotiation mechanism that is capable of not only choosing between HTTP/1.x and HTTP/2.x, but also for bindings of HTTP URLs to other transports (for example).
A negotiation mechanism that is capable of not only choosing between HTTP/1.x and HTTP/2.x, but also for bindings of HTTP URLs to other transports (for example).
To move the discussion forward, Microsoft presented Upgrade-based Negotiation for HTTP/2.0 at the HTTPbis meeting. This presentation is based on our draft proposal which allows HTTP/2.0 to be negotiated either in the clear or over TLS. Further details on its design and MS Open Tech related HTML5 Labs prototype are available in More HTTP/2.0 Prototyping: a Suggested Approach to the Protocol Upgrade.
The working group consensus was “to pursue this path” and gather more data on its success in real world deployments when the connection is not secure. Drafts for alternatives that enhance or bypass the Upgrade approach were also solicited.
There has been limited discussion in the HTTPbis working group on flow control. Microsoft presented Flow Control Principles for HTTP 2.0 to build consensus around the rules and guidelines for future Flow Control prototypes and experimentation. Based on the response to the presentation, Mark Nottingham, the HTTPbis chair, requested a draft proposal to be submitted which incorporated suggestions from other participants. Microsoft submitted the first version of HTTP 2.0 Principles for Flow Control with contributions from Ericsson. Further versions with additional contributors are expected.
We were very pleased with the progress of the discussions as reflected in the audio and the draft meeting minutes.
As Lao Tzu wrote “A journey of a thousand miles begins with a single step”. IETF 85 was the first step towards the proposed completion date of November 2014. Next steps are a potential interim face to face meeting in January or February 2013 and then IETF 86 in March 2014. We’re looking forward to contributing and participating in these sessions.
Gabriel Montenegro, Brian Raymor, and Rob Trace
Portability and interoperability of virtualization technologies across platforms using Linux and Windows virtual machines are important to Microsoft and to our customers.
To that end, System Center VMM continues to gain valuable interoperability and portability experience using Open Virtualization Format (OVF) with their OVF Export/Import tool and partners such as Citrix and VMware.
For more information, see System Center's most recent post from Cheng Wei and that of Citrix's technical architect Shishir Pardikar.
Monica MartinSenior Program ManagerMicrosoft Open Technologies, Inc.