There are two forms of parallelism that serve to improve the performance of processors: the first is Instructional Level Parallelism (ILP). ILP consist of applying the techniques of superscalar processing and pipelining to overlap as the execution of as many instructions as possible (DeMone, 2000).

Superscalar and Pipelining are two ILP techniques of improving the performance of the convention CU/ALU model by increasing instruction cycle throughput. Both techniques rely on a division of the fetch-execute cycle of a processor into two separate and independently operating units. Typically, the execution unit is further divided into separate units designed to handle different types of instructions (Englander, 2003).

Pipelining is the concept of simultaneously fetching multiple instructions to be processed. While an execution unit is decoding and executing one instruction, the fetch unit is retrieving the next instruction(s) to be executed. This overlapping of the two units conceptually creates an assembly-line (or pipeline) of instructions so that the processor is constantly executing instructions without having to wait for them to be fetched. Holes in a pipeline and branching conditions aside, when a processor is capable of executing an instruction upon each tick of its system clock, it is referred to as a scalar processor. (Englander, 2003)

Superscalar processors are those have more than one unit of execution, each with its own pipeline capabilities, but are distinguished from scalar ones in that they are capable of executing more than one instruction at a time resulting in the execution of multiple instructions each clock cycle. This is accomplished by the processor looks for instructions that can be handled within the same clock cycle and processes them together. (Englander, 2003)

The second form of parallelism is called Thread Level Parallelism (TLP) which serves to improve processor performance by isolating the execution of programs into threads. Multithreading refers to the ability to run these threads independently and simultaneously (DeMone, 2000). The performance improvement goal of isolating programs into threads, and multithreading their execution, is to have each active thread executing an instruction each clock cycle.

References:
DeMone, P. (2000) Alpha EV8 (Part 2): Simultaneous Multi-Threat. Real World Technologies. Retrieved March 6th, 2008 from
http://www.realworldtech.com/page.cfm?ArticleID=RWT122600000000 Englander, I. (2003). The Architecture of Computer Hardware and Systems Software.