Low power microprocessor design

Abstraction:

Low power is one of the most of import marks of embedded microprocessor design. To cut down the dynamic power and inactive power, several design engineerings applied in existent microprocessors. But the term “low power” is non simple in existent application. Harmonizing to the system low power, many assortments of low power characteristics are required for microprocessors.

Introduction

History

In 1983 the CAE package company called Gateway design Automation released the Verilog Hardware Description Language normally known as “Verilog HDL” . The simulator and linguistic communication were enhanced in 1985 and it is called as “Verilog XL” . It was an translator. Due to this nature the hardware design applied scientists felt easy to interactively debug hardware design. With this the applied scientists could make something more than theoretical account and simulate, they could besides trouble-shoot as the same manner they do in existent hardware bread board. The Verilog behavioral concepts could depict both hardware and trial stimulation. At the gate degree the Verilog XL was fast and can manage design in surplus of 100,000 Gatess. Due to this it gained a strong bridgehead among high terminal interior decorators. Design size of a individual bit began to transcend. In 1987 the proprietary Verilog behavioral linguistic communication was began to utilize by another start up company Synopsys. The VHDL criterions was released by IEEE at the same clip pulling attending to “top down design” utilizing behavioral hardware description linguistic communication and synthesis.

In the early 1990s, the Verilog Hardware Description Language ( HDL ) and Verilog XL simulator was spited by meter and released Verilog HDL for public sphere. At this clip the “Open Verilog International” ( OVI ) was formed to command the linguistic communication. OVI comprises of both Verilog users and CAE sellers. Verilog was supported by all ASIC metalworkss and used Verilog XL as the “Golden” simulator. In 1993, about 85 % of them used Verilog to plan which was submitted to ASIC metalworkss. In December 1995, the Verilog linguistic communication was reviewed and adopted by IEEE as IEEE criterions 1364.

After many old ages, new characteristics have been added to Verilog, and new version is called Verilog 2001. This version seems to hold fixed batch of jobs that Verilog 1995 had. This version is called 1364-2000.

Degrees Of Abstraction:

A hardware description linguistic communication can be used to plan at any degree of abstraction from high-ranking architectural theoretical accounts to low-level switch theoretical accounts. These degrees, from least sum of item to most sum of item, are given in Table 1. The top two degrees use what are called Behavioural Models, while the lower three degrees use what are called Structural Models.

ALGORITHMIC LEVEL

BEHAVIOURAL MODEL

ARCHITECTURAL LEVEL

REGISTER TRANSFER LEVEL ( RTL )

STRUCTURAL MODEL

GATE LEVEL

SWITCH LEVEL

Table 1 Levels of Abstraction

Behavioral Model:

Behavioural theoretical accounts consist of codification that represents the behavior of the hardware without regard to its existent execution. Behavioural theoretical accounts do n’t include clocking Numberss. Buss do n’t necessitate to be broken down into their single signals. Adders can merely add two or more Numberss without stipulating registries or Gatess or transistors. Behavioral theoretical accounts can be classified farther as Algorithmic or Architectural.

Algorithms are bit-by-bit methods of work outing jobs and bring forthing consequences. No hardware execution is implied in an algorithmic theoretical account.

Architectural theoretical accounts describe hardware on a really high degree, utilizing functional blocks like memory, control logic, CPU, FIFO, etc. These blocks may stand for Personal computer boards, ASICs, FPGAs, or other major hardware constituents of the system. An architectural description may affect clock signals, but normally non to the extent that it describes everything that happens in the hardware on every clock rhythm. That is the duty of a Register Transfer Level ( RTL ) description. An architectural description is utile for finding the major maps of a design.

Structural Model:

It can be farther be classified as registry transportation degree ( RTL ) , gate degree, exchange degree.

The Register Level Transfer is normally known as RTL. Designs utilizing the Register-Transfer Level stipulate the features of a circuit by operations and the transportation of informations between the registries. The RTL degree codification specifies what happens at each clock borders. RTL codifications may utilize Boolean maps that can be implemented on Gatess. State machines are good illustrations of RTL description, but RTL province machine are defined by what occurs on each clock rhythms. Any codification that is synthesizable is called RTL codification is the modern definition.

Gate degree patterning consists of codification that specifies really simple maps such as NAND and NOR Gatess. Gate degree codification is generated by tools like synthesis tools and this net list is used for gate degree simulation and for backend.

The existent transistor switches that are combined to do Gatess is the lowest degree description is that of switch-level theoretical accounts.

Verilog Data Types And Data Objects:

The construct of both informations type and informations objects are shown in below tabular array:

VERILOG

DATA TYPES

DATA OBJECTS

01XZ ( defined by linguistic communication )

Signal cyberspaces

wire

tri

wired cyberspaces

wand

triand *

wor

trior *

trireg *

tri0 *

tri1 *

supply cyberspaces

supply0

supply1

registry

parametric quantity

whole number

clip *

memory ( array )

Table 2 Verilog Data Types and Data Objects

Note: informations objects are non supported by synthesis tool.

The linguistic communication itself defines that it has individual base informations type which consists of the following four value set, they are:

0 – Represents for false status, a logic nothing.

1 – Represents for true status, a logic one.

X – Represents for the unknown logic value.

Z – Represents high electric resistance province.

It defines merely allowable informations types:

VERILOG DATA OBJECTS:

In net and registry informations objects if they are declared without a scope so by default it is one spot broad known as scalar, and if scope is declared it is known as vector.

Internet:

A net object must ever delegate utilizing uninterrupted assignment statement. It is a mechanism of delegating a value to the net and registry informations types.

wire: This structurally connects two signals together.

wor: wired OR of several drivers driving the same cyberspace.

wand: wired and of several drivers driving the same cyberspace.

Register:

It holds its value from one procedural statement to the following and means it holds the value over simulation delta rhythm. It shops the value in the registry informations type. Under trigger status such as if and instance statement, it is used to delegate the value.

Parameter:

It defines a changeless. For synthesis it uses merely integer parametric quantity invariable.

Integer:

To declare a general intent variable for usage in loops the whole number informations types are used. For keeping the numeral informations they do n’t hold direct hardware intend. For whole number there is no scope specified. Integers are signed and bring forth a 2 ‘s complement consequences.

String section:

It is sequence of character enclosed by dual quotation marks and all in a individual line. To declare a variable to hive away a twine, declare a registry big plenty to keep the maximal figure of characters the variable will keep. Note that no excess spots are required to keep a expiration character ; Verilog does non hive away a twine expiration character. String sections can be manipulated utilizing the criterion operators. The particular character strings are:

new line character

Tab character

Backslash ( ) character

” Double quotation mark ( “ ) character

ddd A character specified in 1-3 octal figures ( 0 & lt ; = vitamin D & lt ; = 7 )

% % Percent ( % ) character

Operators:

The below tabular array shows the Verilog operators, operators with equal precedency are shown sorted. There are nine functional groups of operators the 3rd column in the tabular array indicates that to which group does the operator belongs to.

VERILOG OPERATOR

Name

FUNCTIONAL GROUP

[ ]

Bit-select

( )

Parenthesis

!

~

& A ;

cubic decimeter

~ & A ;

~l

^

~^ or ^~

Logical negation

Negation

Decrease AND

Decrease OR

Decrease NAND

Decrease NOR

Decrease XOR

Decrease XNOR

Logical

Bitwise

Decrease

Decrease

Decrease

Decrease

Decrease

Decrease

+

Unary ( Sign ) plus

Unary ( Sign ) subtraction

Arithmetical

Arithmetical

{ }

{ { } }

Concatenation

Reproduction

Concatenation

Reproduction

.

/

%

Multiply

Divide

Modulus

Arithmetical

Arithmetical

Arithmetical

+

Binary plus

Binary subtraction

Arithmetical

Arithmetical

& lt ; & lt ;

& gt ; & gt ;

Shift left

Shift right

Shift

Shift

& gt ;

& gt ; =

& lt ;

& lt ; =

Greater than

Greater than or equal to

Lesser than

Lesser than or equal to

Relational

Relational

Relational

Relational

==

! =

Logical equality

Logical inequality

Equality

Equality

===

! ==

Case equality

Case inequality

Equality

Equality

& A ;

Bitwise AND

Bitwise

^

^~ or ~^

Bitwise XOR

Bitwise XNOR

Bitwise

Bitwise

Liter

Bitwise OR

Bitwise

& A ; & A ;

Logical AND

Logical

Ll

Logical OR

Logical

? :

conditional

Conditional

Table 3 VERILOG OPERATORS

Gate Dealys:

In existent circuits, logic Gatess have holds associated with them. Gate delays let the Verilog user to stipulate holds through the logic circuits. Pin-to-pin holds can besides be specified in Verilog. There are three types of holds they are RISE, FALL and TURN OFF holds.

RISE Delay:

It leads with a gate end product passage to 1 from another value.

FALL DELAY:

It is associated from any other value to the gate end product passage to 0.

TURN OFF DELAY:

It is associated with a gate end product passage to the high electric resistance value omega from another value. If the value changes to x, the lower limit of the three holds is considered. If no holds are specified, the default value is zero.

Architectures:

The word architecture in the context of computing machine scientific discipline is frequently misused. Used accurately, architecture refers to the direction set and resources available to person who writes plans. The architecture is what is described in a definition papers, frequently called a user ‘s manual. Therefore, architecture contains direction formats, direction semantics ( operation definitions ) , registries, memory turn toing manners, features of the reference infinite ( additive, segmented, particular reference parts ) , and anything else a coder would necessitate to cognize.

Harvard Architechture:

It uses separate memories for their direction and informations, dedicated bases for each of them. At at the same time direction and operands are fetched. Different informations and programme breadths are possible. The below fig shows the block diagram:

Von Neumann Architechture:

In this architecture both informations and direction are stored in a memory, unless the order is explicitly modified instructions are executed consecutive. The architecture consists of CPU and a memory as shown in the below fig:

DATA CONTROL

The CPU contains the Control Unit ( CU ) in which the instructions are executed, and arithmetic and logical operation are performed in ALU Arithmetic Logical Unit.

To put to death the direction fetched from the chief memory is the primary map of the CPU. That direction will state the CPU to execute its operation. Here the Control Unit acts as the Interpreter, it decodes the instructions which is fetched and tells the different other constituents what to make. There is a set of Register which is used to hive away the impermanent informations and intermediate consequences inside the CPU. The complete block diagram is shown below:

VON NEUMANN ARCHITECTURE

Data and Control information ( instructions ) are all represented in binary format which uses merely two basic symbols: “ 0 ” and “ 1 ” . Here the CPU can merely put to death the machine direction.

A machine direction is represented as a sequence of spots as OPCODE and OPERAND.

Superscalar:

Superscalar processing is the ability to originate multiple instructions during the same clock rhythm. A typical Superscalar processor fetches and decodes the incoming direction watercourse several instructions at a clip. It allows CPU throughput than would otherwise be possible at the same clock rate. All all-purpose CPUs developed since about 1998 are superscalar. The below figure shows the fetching and dispatching of two direction per rhythm.

Bringing and despatching two instructions per rhythm

Very Long Instruction Word ( Vliw ) :

In early 1980, John Fisher introduced the construct of VLIW architecture in his research group at Yale University. In 1984 Fish left Yale and found a start-up company, Multiflow, along with the cofounders John O’Donnell and John Ruttenberg. Multiflow produced the TRACE series of VLIW minisupercomputers, transporting their first machines around 1988. It could publish 28 operations in parallel per direction. The chief advantage of VLIW architecture is increased public presentation, possible easier to plan. And it has some disadvantages excessively, they are new sorts of programmer/compiler feels it hard, high power ingestion.

Complex Instruction Set Computer ( Cisc ) :

Pronounced as sisk, and stands for Complex Instruction Set Computer. Most Personal computer ‘s usage CPU based on this architecture. Executes several low degree operations. In complex direction set computing machine, each direction can put to death several low-level operations, such as arithmetic operation, loaded from memory and memory shops all in individual direction. In 1974, John Cocke of IBM Research decided to seek an attack that dramatically reduced the figure of instructions a bit performed. By the mid-1980s this had led to a figure of computing machine makers change by reversaling the tendency by constructing CPUs capable of put to deathing merely a really limited set of instructions. Examples of CISC processors are the CDC 6600, System/360, VAX, PDP-11, Motorola 68000 household, and Intel and AMD x86 CPUs.

Direction can be operated straight on memory, little figure of general intent registry, instructions take multiple clock to put to death, The first pipelined “ CISC ” CPUs, such as 486s from Intel, AMD, Cyrix, and IBM, surely supported every direction that their predecessors did, but achieved high efficiency merely on a reasonably simple x86 subset.

Reduced Instruction Set Computer ( Risc ) :

Pronounced as hazard, and stands for Reduced Instruction Set Computer is a type of microprocessor architecture that utilizes a little, highly-optimized set of instructions, instead than a more specialised set of instructions frequently found in other types of architecturesThe first system that would today be known as RISC was the CDC 6600 supercomputer, designed in 1964, a decennary before the term was invented. The CDC 6600 had load-store architecture with merely two turn toing manners ( registry registry, and registry immediate changeless ) and 74 opcodes. The most common RISC microprocessors are PIC, ARM, DEC Alpha, PA-RISC, SPARC, MIPS, and IBM ‘s PowerPC.

The first RISC undertakings came from IBM, Stanford, and UC-Berkeley in the late seventies and early 80s. The IBM 801, Stanford MIPS, and Berkeley RISC 1 and 2 were all designed with a similar doctrine which has become known as RISC. Certain design characteristics have been characteristic of most RISC processors:

1. One rhythm executing clip: RISC processors have a one CPI ( clock per direction ) of one rhythm. This is due to the optimisation of each direction on the CPU.

2. Pipelining: a technique that allows for coincident executing of parts, or phases, of instructions to more expeditiously process instructions.

3. Large figure of registries: the RISC design doctrine by and large incorporates a larger figure of registries to forestall in big sums of interactions with memory.

The difference between CISC and RISC is as shown below

Criminal intelligence services of canada

RISC

Complex direction requires multiple rhythm.

Reduced direction takes one rhythm.

Many instructions can cite memory.

Merely Load and Store instructions can cite memory.

Instruction manuals are executed one at a clip.

Uses pipelining to put to death instructions.

Few general registries.

Many general registry.

Processors:

Development of Microprocessor:

Gordon Moore was the caput of R & A ; D at Fairchild semiconducting material in 1965 and he wrote an article for the thirty-fifth day of remembrance of Electronic Magazine. In 1959 to 1965 he realized that the no of transistors inside the bit have being doubled every twelvemonth. Below fig is taken from the Moore ‘s Electronic Magazine which was published in the twelvemonth 1965. And it shows the clear image of the processing power would be exponentially raised at faster rate.

Logarithm of the Number of Components on a Memory Chip over Time.

In the International Electrical and Electronics Engineers Meeting he revised it as the no of transistor inside the bit will duplicate every 2 old ages. When the first microprocessor was introduced i.e. Intel 4004 till the debut of Intel Pentium2 in 1997 it is about twofold every 2 old ages. His jurisprudence aid to cut down the cost of calculation, extra applications crossed the threshold of affordability, farther increasing the demand for calculating. From all these Gordon Moore was in the + feedback cringle which made the Moore ‘s jurisprudence of 1965 came true. It became the design for the semiconducting material industries.

Moore ‘s Original graph

There are by and large two types of microprocessors: all-purpose microprocessors and application based microprocessors.

1. General-purpose microprocessors, such as the Pentium CPU can execute different undertakings under the control of package instructions. General-purpose microprocessors are used in all personal computing machines.

2. Application Based microprocessors are designed to execute merely one specific undertaking.

The microprocessor can be divided into 2 parts they are the Datapath and the Control unit, as shown in below Figure

Internal parts of Microprocessor.

The operations such as add-on were done inside the arithmetic logical unit ; the informations way is responsible for the executing of all this ALU operations. It besides includes registry for the impermanent storage of informations. The information between the 2 functional unit was transferred by informations signal. The group of informations signal to organize the coach. In the above fig the coach is shown by the thicker lines and they are of 8 spots broad. Then the MUX are used to choose the information from 2 or more beginning to travel for the one finish. The end product of the arithmetic logic unit is connected to the input of the registry and in bend the input of the registry is connected to the 3 different finishs viz. :

1. For proving non equal to 0, OR gate is used as the comparator.

2. To the right operand of the ALU.

3. And the tri-state buffer, the end product informations from the registry has been controlled by these.

Control unit is the chief key in the microprocessor because in order to put to death the informations operations by informations waies control unit is required, it acts as the accountant. The below fig shows that how the parts of microprocessor are fitted together.

Intel introduced 8086 pins during the twelvemonth 1976. The major difference between 8085 and 8086 processor is that 8085 is an 8 spot processor, but 8086 processor is a 16 spot processor, they do non incorporate Floating point instructions here drifting point refers to the base point or denary point. Processors such as 8085 and 8086 do non back up such representations and instructions.

Intel subsequently introduced 8087 processor which was the first math co-processor and subsequently the 8088 processor which was incorporated into IBM personal computing machines.

As the old ages goes on tonss of processors from 8088, 80286, 80386, 80486, Pentium II, Pentium III, Pentium IV and now Core2Duo, Dual Core and Quad nucleus processors are the latest in the market. There are some other makers who produce the CMOS version of 8085 microprocessor. Such makers are called 2nd beginning makers.

The 2nd beginning makers include: AMD, Mitsubishi, NEC, OKI, Toshiba, and Siemens.

Power Dissipation:

Poweris clip rate of making work or of bring forthing or using energy. Inother words, is a step of how rapidly work can be done.

power = work/time

Energy is a step of how long we can prolong the end product of power, or how much work we can make.

work done = force ten distance

In general in the CMOS circuit power dissipation is composed of both Static and Dynamic Component.

Pavg = Pleakage + Pshort + Pswitching.

Where Pleakage = Static Power Dissipated.

Pshort = Dynamic power Dissipated.

Pswitching = Dynamic power required to bear down the burden electrical capacities.

Due to the escape current of the contrary biased rectifying tube and the bomber threshold transistor conductivity the inactive power dissipation are occurred. The Dynamic constituent is the dominant portion power dissipation in the CMOS circuit which is composed of 2 footings that is Switch overing activity which occurred due to the charge and discharge of the circuit node electrical capacity at the end product of the each logic gate, and another is the short circuit power it is due to the short circuit current from the supply electromotive force to the land while end product passage. Below graph shows the power dissipation of intel microprocessor.

Power Dissipation in microprocessor.

Power Consumption In Microprocessor:

In 1944 the ENIAC was programmed manually by utilizing wires and connexions between the executing unit it was utilizing 18000 vacuity tubings, power ingestion was 150000 Watts. The Whirlwind designed SAGE ( Semi-Automatic Ground Environment ) it used 75000 vacuity tubings and consumed 750000 Watts. In general as figure of vacuity tubings increases the power ingestion besides increased dramatically. So the Transistors was introduced in 1947, the first transistorized computing machine TXO designed by the Lincoln Lab in 1957 containing of 3500 transistors and power consuming was 1000 Watts. As comparing to hoover tubings, transistors power ingestion is in scope of 10s of factory Wattss.