Tweet, Tweet, Retweet: Conversational Aspects of Retweeting on Twitter
danah boyd, Scott Golder, and Gilad Lotan
Twitter—a microblogging service that enables users to post messages (“tweets”) of up to 140 characters—supports a variety of communicative practices; participants use Twitter to converse with individuals, groups, and the public at large, so when conversations emerge, they are often experienced by broader audiences than just the interlocutors. This paper examines the practice of retweeting as a way by which participants can be “in a conversation.” While retweeting has become a convention inside Twitter, participants retweet using different styles and for diverse reasons. We highlight how authorship, attribution, and communicative fidelity are negotiated in diverse ways. Using a series of case studies and empirical data, this paper maps out retweeting as a conversational practice.
No. MSR-TR-2009-117 · August 2009
Type: TechReport
Learning to Rank with Graph Consistency
Bo Geng, Linjun Yang, and Xian-Sheng Hua
The ranking models of existing image search engines are generally based on associated text while the image visual content is actually neglected. Imperfect search results frequently appear due to the mismatch between the textual features and the actual image content. Visual reranking, in which visual information is applied to re¯ne text based search results, has been proven to be effective. However, the improvement brought by visual reranking is limited, and the main reason is that the errors in the text-based results will propagate to the refinement stage. In this paper, we propose a Content-Aware Ranking model based on "learning to rank" framework, in which textual and visual information are simultaneously leveraged in the ranking learning process. We formulate the Content-Aware Ranking learn- ing based on large margin structured output learning, by modeling the visual information into a regularization term. The direct optimization of the learning problem is nearly infeasible since the number of constraints is huge. The effcient cutting plane algorithm is adopted to learn the model by iteratively adding the most violated constraints. Extensive experimental results on a large-scale dataset collected from a commercial Web image search engine demonstrate that the proposed ranking model signifficantly outperforms the state-of-the-art ranking and reranking methods.
No. MSR-TR-2009-116 · 25 August 2009
Type: TechReport
The Application of Bayesian Inference to Traffic Analysis
George Danezis and Carmela Troncoso
This work casts the traffic analysis of anonymity systems, and in particular mix networks, in the context of Bayesian inference. A generative probabilistic model of mix network architectures is presented, that incorporates a number of attack techniques in the traffic analysis literature. We use the model to build an Markov Chain Monte Carlo inference engine, that calculates the probabilities of who is talking to whom given an observation of network traces. We provide a thorough evaluation of its correctness and performance, and confirm that mix networks with realistic parameters are secure. This approach enables us to apply established information theoretic anonymity metrics on complex mix networks, and extract information from anonymised traffic traces optimally.
No. MSR-TR-2009-112 · 18 August 2009
Type: TechReport
A Tutorial-style Introduction to Subspace Gaussian Mixture Models for Speech Recognition
Daniel Povey
This is an in-depth, tutorial-style introduction to the techniques involved in training a factor analyzed style of speech recognition system. Algorithms are explained in detail, with an emphasis on the how-to rather than the derivations. The recipe described here is both an extension to and a special case of the prior work we have done. Changes include a simplification of the procedure used to initialize these models, the introduction of ``sub-models'' which saves memory and may have modeling advantages, an extended approach to factor based speaker adaptation that uses the sub-models, and a mechanism to estimate a subspace-constrained version of Constrained MLLR transforms in this framework.
No. MSR-TR-2009-111 · 17 August 2009
Type: TechReport
End-to-end Verification of Security Enforcement is Fine (Extended version)
Nikhil Swamy, Juan Chen, and Ravi Chugh
Proving software free of security bugs is hard. Programming language support to ensure that programs correctly enforce their security policies would help, but, to date, no language has the ability to verify the enforcement of the kinds of policies used in practice---dynamic, stateful policies which address a broad range of concerns including forms of access control and information flow tracking. This paper makes two main contributions. First, we present Fine, a new source-level security-typed language that, through the use of a simple module system and dependent, refinement, and affine types, can be used to check the enforcement of dynamic security policies applied to real software. Second, we define DCIL, a small extension to the type system of the .NET Common Intermediate Language, and show how to compile Fine in a type-preserving manner to DCIL. Our approach allows Fine programs to run on stock .NET virtual machines and to interface with .NET libraries. Additionally, our type-preserving compiler allows code consumers to download DCIL programs and check them for security while relying on a small trusted computing base. We have proved our source and target languages sound, our compilation type-preserving, and have made a prototype implementation of our compiler and several example programs available.
No. MSR-TR-2009-98 · 7 August 2009
Type: TechReport
HD View: A Software Camera for the Web
Matt Uyttendaele, Howard Good, and Michael F. Cohen
As captured imagery grows in resolution to gigapixels, increases in dynamic range beyond what can be displayed, and encompasses wider fields of view, there is a need for the viewer of such imagery to take on many of the roles traditionally relegated to the camera. Fully exploring such imagery requires a software camera. One such software camera is HD View. We discuss the need for the evolution towards software cameras and then describe the current implementation of HD View. HD View is an interactive software camera that rovides flexible output lens control, full color management, and interactive tone mapping of high resolution, wide gamut, and high dynamic range imagery.
No. MSR-TR-2009-95 · 3 August 2009
Type: TechReport
Web Search Using Small Cores: Quantifying the Price of Efficiency
Vijay Janapa Reddi, Benjamin Lee, Trishul Chilimbi, and Kushagra Vaid
The commoditization of hardware, data center economies of scale, and Internet-scale workload growth all demand greater power efficiency to sustain scalability. Traditional enterprise workloads, which are typically memory and I/O bound, have been well served by chip multiprocessors comprising of small, power-efficient cores. While small cores deliver performance-per-Watt efficiency for such data center workloads, small cores impact application quality-of-service robustness, flexibility, and reliability for emerging Internet-scale applications, which increasingly invoke computationally intensive kernels. These challenges constitute the price of efficiency, which we quantify for an industry-strength, production-quality, next-generation online web search engine. Specifically, we evaluate search on server- and mobile-class architectures using Xeon and Atom processors, quantifying search efficiency at the microarchitecture- and system-level. Our findings prompt us toward re-thinking small core designs for a new breed of data center workloads in order to continue reaping the benefits of small-core power efficiency.
No. MSR-TR-2009-105 · August 2009
Type: TechReport
A Composable Model for Analyzing Locality of Multi-threaded Programs
Chen Ding and Trishul Chilimbi
In a multi-threaded execution, threads may negatively interfere when their private data contends for shared cache or positively interact when the data brought in by one thread is used by other threads. This paper presents a model of such cache behavior to predict locality without exhaustive simulation and provide insight into trends. The new model extends prior work that assumes no data sharing and uniform thread interleaving. Based on a single pass over an interleaved execution trace, we compute a set of per-thread statistics that includes the effect of thread interleaving and data sharing. The per-thread statistics is then composed to predict performance for all cache sizes, either for sub-clusters of threads or for futuristic environments with a larger number of similar threads. We evaluate and validate our model against exhaustive simulation using a server application running on a quad-core machine and productivity, multimedia and gaming applications running on a dual-core machine. The results indicate that our model is accurate and relies on incorporating both irregular thread interleaving and data sharing to achieve this accuracy. In addition, it identifies and separates individual factors affecting locality and scalability and hence opens new possibilities in performance tuning, program scheduling, and hardware cache design for concurrent applications.
No. MSR-TR-2009-107 · August 2009
Type: TechReport
Convergence Speed of Binary Interval Consensus
Moez Draief and Milan Vojnovic
We consider the speed of convergence of an instance of the binary interval consensus, a distributed and decentralized algorithm for computing the quantized average value. With binary consensus problem, each node initially holds one of two states and the goal for each node is to correctly decide which one of the two states was initially held by the majority of nodes. We derive an upper bound on the expected convergence time that holds for arbitrary connected graphs; it is based on the location of the eigenvalues of some contact rate matrices. We instantiate our bound for particular networks of interest, including complete graphs, star-shaped networks, and Erdos-Renyi random graphs, and in the former two cases compare with alternative computations. We find that for all these examples our bound is of the exact order with respect to the number of nodes and in some cases yields the exact multiplicative constant. We pinpoint the fact that the expected convergence time critically depends on the voting margin defined as the difference between the proportions of nodes that initially held the majority and the minority states, respectively. We derive an exact relation between the expected convergence time and the voting margin, for some of these graphs, that reveals how the expected convergence time tends to infinity as the voting margin approaches zero. Our results provide insights on how the expected convergence time depends on the network topology which can be used for performance evaluation and network design. The results are of interest in the context of peer-to-peer systems; in particular, for sensor networks and distributed databases.
No. MSR-TR-2009-86 · August 2009
Type: TechReport
Carbon: trusted auditing for P2P distributed virtual environments
John L. Miller and Jon Crowcroft
Many Peer-to-Peer Distributed Virtual Environments (P2P DVE’s) have been proposed, but none are widely deployed. One significant barrier to deployment is lack of security. This paper presents Carbon, a trusted auditing system for P2P DVE’s which provides reasonable security with low per-client overhead. DVE’s using Carbon perform offline auditing to evaluate DVE client correctness. Carbon audits can be used to catch DVE clients which break DVE rules – cheaters – so the DVE can punish them. We analyze the impact of applying Carbon to a peer-to-peer game with attributes similar to World of Warcraft. We show that 99.9% of cheaters – of a certain profile – can be caught with guided auditing and 2.3% bandwidth overhead, or 100% of cheaters can be caught with exhaustive auditing and 27% bandwidth overhead. The surprisingly low overhead for exhaustive auditing is the result of the small payload in most DVE packet updates, compared to the larger aggregate payloads in audit messages. Finally, we compare Carbon to PeerReview, and show that for DVE scenarios Carbon consumes significantly less resources – in typical cases by an order of magnitude – while sacrificing little protection.
No. MSR-TR-2009-110 · August 2009
Type: TechReport
Wi-Fi Networks are Underutilized
Ramya Raghavendra, Jitendra Padhye, and Ratul Mahajan
Much of wireless research today focuses on improving capacity of wireless networks. While wireless capacity can be a critical issue in cases where the number of users is large, or for high bandwidth applications such as video downloads, we find that this is not the case in day-to-day environments such as corporate WLANs, universities, homes, cafe's etc. Through empirical measurements in each of these networks, we show that even at peak usage, wireless networks have plenty of spare capacity. For the range of networks that we measured, we found medium utilization remains under 50% for upto 90% of the time. The traffic patterns are bursty, and utilization can reach as much as 80% in some cases on very short timescales, but for most part, there is plenty of capacity available. Ironically, we find that while capacity is plentiful, packet loss remains a problem. Our measurements show that these wireless links on an average suffer from loss rates of about 10%. We discuss the causes and implications of these observations, and consider how our results might guide protocol design.
No. MSR-TR-2009-108 · August 2009
Type: TechReport
Experiences with Lower-Cost Access to Tactile Graphics in India
M. Kaleem Rahman, Saurabh Sanghvi, Kentaro Toyama, and M. Bernardine Dias
Tactile graphics allow the visually impaired to perceive two-dimensional imagery, which is an essential part of learning science, geography, and other subjects. In the developed world, such graphics are available to blind students from an early age, and students grow up familiar with image representations. Tactile graphics, however, require special printers whose costs are often beyond resource-constrained institutions; thus, blind students in developing regions often grow up without any exposure to these learning aids. In this paper, we investigate the potential of a software solution for converting regular images into a form that can be printed as tactile imagery on relatively low-cost embossing device meant only for braille text. Using techniques of ethnographic design, we explore how students at a school for the blind in India interpret tactile graphics on their first contact with such material, and for a variety of subject matter. We find that our subjects were exceedingly enthusiastic about tactile graphics, rapidly able to understand and absorb two-dimensional representations, and that studying tactile graphics of the alphabet could lead to their learning how to write the alphabet for the very first time.
No. MSR-TR-2009-102 · August 2009
Type: TechReport