Posts Tagged ‘CPU’

Noctua’s DH-14 CPU Cooler, Sandy Bridge-E Tested and Burned In

February 15th, 2012

A few months back, we examined the performance of four liquid coolers for the Sandy Bridge-E socket LGA 2011 platform: Corsair’s H80 and H100, Maingear’s Epic 180, and Intel’s own reference cooler. Liquid coolers are popular with high-end OEMs and some enthusiasts, but they aren’t cheap — and there’s a certain group of enthusiasts who aren’t comfortable with flowing liquid in their systems.

Recently, we spoke with the team at Noctua, who offered to send us one of its high-end air coolers for comparison. We’ve been curious about the company since it came across our radar with its upgrade offer for anyone upgrading to Socket 2011 last year.

The Noctua DH-14 is a mammoth air cooler that competes performance-wise with many self-contained liquid coolers.

Source:http://hothardware.com/News/Noctuas-DH14-CPU-Cooler-Sandy-BridgeE-Tested-and-Burned-In/

What goes on inside the CPU

February 13th, 2012

A CPU is a terrifically complex piece of hardware. Even Intel’s most lightweight Atom processors comprise more than 40 million transistors on a piece of silicon the size of a fingernail. And it isn’t only about numbers: the way these transistors implement the core logic that drives netbooks, set-top boxes and desktop PCs is the product of some seriously advanced electronic engineering.

Although the physical construction of CPUs may be abstruse, it isn’t hard to understand the principles of how modern processors are designed, and how they work. And it’s good to have an insight into what’s happening inside your PC. When you’re buying or upgrading a PC, such knowledge can help you choose the right processor for a given role, or troubleshoot unexpected behaviour. An understanding of how instructions are processed inside the CPU can also help programmers construct their code so it will run as quickly and efficiently as possible – although these days much processor optimisation is handled automatically by the compiler.

Physical construction

When you picture a CPU, you probably think of a small square of circuit board with a metal casing mounted on it. This is the CPU “package”. The casing isn’t removable, but if you were to prise it off you’d find a small square of silicon below it. This is the die, which contains the functional features of the CPU.

These features take the form of millions of transistors, etched into the silicon by a process called photolithography. Simply put, this involves projecting a design onto a sheet of silicon, using light-sensitive chemicals to “paint” this image onto the surface, then using corrosive agents to etch away the uncoated areas. By stacking multiple photolithographed layers together, it’s possible to make working 3D electronic circuits at a minuscule scale.

The precise scale of these transistors is usually expressed in micrometres (µm) or nanometres (nm), reflecting the size of the smallest feature that can be produced by a given fabrication process. Chip manufacturers are constantly shrinking the process size of their chips: the original Intel 80386 processor was manufactured using a 1.5µm process; the first Pentium IV models used a 180nm process; and the latest Core i7 processors are the product of a 32nm process.
Shrinking the process size enables manufacturers to produce ever more complex chips without commensurately increasing the size of the die. For example, the old Intel 80386 comprised 275,000 transistors on a die measuring 104mm2; thanks to process shrinkage, a Core i7-2600 processor crams 995 million transistors into a 216mm2 die.

This is good for several reasons. First, producing high-grade silicon is expensive. A smaller die means lower material costs, so the chips can be sold cheaply and the manufacturer can still make a healthy profit on each one.

Ordinarily, when you shrink a transistor it also requires less power to operate, and burns off less energy as heat. This means smaller transistors can run at higher frequencies without overheating, giving a free performance boost. Since they draw less power, they’re better for the environment – and also, in the case of laptops and mobile devices, for battery life.

Regardless of the process size, all chips generate some heat. This is conducted away by the metal casing, and from there to an external cooler unit. The maximum amount of heat that a chip is expected to dissipate is referred to as its Thermal Design Power (TDP), measured in watts. It doesn’t necessarily indicate how much power a processor will draw in real-world use, but you can expect a chip with a 35W TDP to draw less power than one with a 65W TDP.

The CPU’s surroundings

The CPU fits into a dedicated socket on a compatible motherboard, which connects it to the rest of the system components. Until a few years ago, communication between the CPU and other components was mediated by two chips on the motherboard: the “north bridge” dealt with high-speed connections such as PCI Express and memory buses, and the “south bridge” handled lower-bandwidth components such as sound cards and hard disks. Together, these chips were called the chipset.

In modern designs, the memory controllers – and, sometimes, the PCI Express controllers too – are built directly into the processor, which makes things faster and more efficient. Motherboards still offer a range of chipsets, however, supporting different features. For example, the Intel H67 Express chipset supports the integrated graphics processors that are built into current Core i3, i5 and i7 processors. The P67 Express chipset doesn’t support onboard graphics, but does enable overclocking (on CPU models that support it). On the AMD side, the 870 chipset allows the motherboard to offer a single PCI Express x16 slot, while the 890FX chipset can support up to four such slots. Many AMD chipsets also include a low-power GPU, since AMD’s Athlon and Phenom processors don’t have graphics inside the chips themselves.

What the CPU does

Clearly, the CPU is a highly sophisticated piece of electronic engineering. But in terms of what it actually does, you can think of it as essentially a calculator, whose primary function is to apply simple mathematical operations to the values stored in its internal registers. It can also copy these register contents to and from the computer’s main memory as needed.

Fundamentally, therefore, computer programming is a process of preparing the appropriate instructions to be performed in the right order on the appropriate data. Originally, this was done in assembly language – a system of codes that directly represent the CPU’s internal instructions. These days it’s more usual to write in languages such as C++, and to use compiler software to translate your program into native machine code. The principle is the same, however.

When the CPU executes a program, each instruction goes through a four-stage cycle. First, the instruction code is fetched from memory. Then it’s decoded to determine the actual instruction required. If the instruction requires a piece of data to be fetched from the main memory, this is done. Finally, the instruction is executed, and any output is written to memory, or to an internal register. Execution then continues with the next instruction code.

If you think of the CPU as a clockwork mechanism, each stage of this process represents a “tick” – more formally known as a clock cycle. The number of ticks the CPU completes per second represents its operating frequency, or clock speed. A typical CPU might have a frequency of around 2GHz – equivalent to two billion cycles per second. (In fact, modern processors can automatically adjust their frequencies to suit the workload – see SpeedStep and Turbo Boost – but for the purposes of this example, let’s assume the frequency is constant.) If each instruction goes through a four-stage process, you’d expect a 2GHz CPU core to complete 500 million instructions per second. In fact, modern CPUs can do rather better than this, thanks to an approach called pipelining.

Pipelining

In the four-stage cycle described above, each stage of the instruction cycle is carried out by a different part of the core; for example, once the instruction fetcher has retrieved an instruction and handed it over to the decoder unit, it has nothing more to do. Pipelining takes advantage of this by immediately putting the fetcher to work on the next instruction, while the decoder unit is doing its job – and so on along the chain. This means all the parts of the CPU can be working at once, and in theory one instruction can be completed in every single clock cycle.

In practice, the system isn’t quite this efficient. We’ve assumed that each stage in the pipeline takes one clock cycle to complete, but in reality the execution stage can often take several cycles, depending on the complexity of the instruction. On an Intel Sandy Bridge processor, for example, multiplying two 32-bit values together takes four cycles, and dividing one value by another takes 26 cycles. While these operations are being processed, the pipeline can’t progress. Depending on the nature of your code, a core could fall far short of executing one instruction per clock cycle.
Pipelining can also fail when a conditional branch instruction is encountered – that is, where the flow of the program is potentially diverted according to a logical test. Such branches are a basic ingredient of programming: even the most casual tinkerer will recognise constructions such as “IF X<5 THEN GOTO 20”. But such constructions are anathema to pipelining. Until the logical test is processed – at the end of the pipeline – the fetcher has no way of knowing whether it should be continuing to fetch the next instruction in sequence, or whether it should be branching to a different point in the program.

So the fetcher makes an educated guess, with the help of a dedicated circuit called a branch predictor. This component tries to predict what the outcome of the logical test will be, based on the outcome of previous similar operations. On this basis, the fetcher continues speculatively loading instructions into the pipeline. But the branch predictor can’t be right all the time, and when it’s wrong the pipelined instructions must be discarded, leading to wasted cycles as the pipeline is refilled.

What’s more, although we’ve characterised the execution cycle as a four-stage process, modern CPU architectures typically break the work down into many smaller stages. Intel’s Core 2 Duo models, for example, use a 14-stage pipeline, and the Pentium D that preceded them had 31 stages. Depending on the processor, a failed branch prediction could mean discarding a dozen pipelined instructions or more – and wasting that many cycles.

Caches & out-of-order execution

There’s a further problem with the pipeline model as we’ve described it above. In our examples, we’ve assumed that fetching instructions and data from the system memory can be done in one clock cycle. In reality, depending on the speed of your DIMMs and CPU, it can easily take 15 cycles or more to load a value from memory into a CPU register.

Obviously, it’s hugely inefficient for the CPU to sit idle for 15 cycles every time it needs to access a value from memory. To work around this, modern processors use two approaches. The first is caching: building small amounts of very fast memory directly onto the CPU, and using this whenever possible, rather than accessing the slower system RAM. A different part of the CPU takes care of synchronising this cached data with main memory, while the execution unit gets on with other tasks.

Modern processors use a multilevel cache system: the smallest and fastest cache is level one (L1), which is used for storing instructions and data, and typically has a capacity of 64KB or 128KB. Then come larger, slower L2 and L3 caches, with sizes measured in megabytes. This very fast, on-chip RAM is expensive to produce, so a common way to reduce the costs is by shrinking or removing caches – but this has a detrimental effect on performance.

In addition to caching, processors can also implement out-of-order execution (OoOE). This means what it says: in an OoOE pipeline, instructions don’t have to be processed in strict sequence, but can overtake one another. If one instruction is held up waiting for data to arrive from the main memory, the CPU can continue to process other instructions that were behind it in the pipeline, and come back to the instruction when it’s ready to be executed.

Clearly, there are limits to OoOE. The CPU can’t just skim back and forth between instructions as it sees fit, or programs wouldn’t work as intended. A buffering system is used to ensure instructions take effect in the right order, even if they were executed out of sequence.

However, OoOE can still save a lot of time – for example, by allowing the execution unit to get started on a slow calculation while another instruction is waiting for its data to arrive from main memory. That sort of optimisation can make a big difference, so almost all modern processors use OoOE. The notable exception is Intel’s Atom range – which is one reason why Atom-powered devices feel sluggish compared to mainstream processors.

Cores & threads

So far we’ve focused on the workings of an individual CPU core; but most current processors combine two or four cores. This allows them to chew through multiple instructions at once, which in turn means you can run more programs smoothly at once. Of course, you can multitask on a single core CPU; but this is achieved by “time-slicing”, rather than true simultaneous processing, and it’s apt to deliver uneven performance.

Having a multicore processor won’t ordinarily make individual programs run any faster. This is because most programs are designed to run in a strict linear fashion – so the second instruction, for example, must be executed after the first has completed, not at the same time. This sequential process – this “thread”, as it’s called – has no way to make use of additional CPU cores.

Some types of task, however, can be divided into multiple threads. For example, imagine a program that’s designed to convert a folder full of audio files into MP3 format. On detecting that it was running on a quad-core processor, such a program might spawn four independent encoding processes, and hand one to each CPU core. In this way, the total encoding time could be quartered. Applications for 3D rendering can typically divide up their workload in a similar way.
Intel processors use a feature called Hyper-Threading to gain additional throughput. Each core in a Hyper-Threading CPU appears to the operating system as two virtual cores. In reality, the core can only process a single instruction at once – but it has two sets of registers, enabling it to switch its attention back and forth between two threads to make the most efficient use of its processing capacity. Predictably, the benefit is less than you’d see from two physical cores. But in our benchmarks we’ve seen Hyper-Threading prove its worth, giving multithreaded tasks a performance boost of around 30%.

Instructions & extensions

Earlier we characterised the CPU as “essentially a calculator”; this isn’t a bad analogy, but the modern CPU boasts several sophistications that your average Casio can only dream of.

For one thing, the conventional pocket calculator can handle up to eight digits only, so the largest number that can be represented is 99,999,999. All modern CPUs can operate on 32-bit binary data, enabling them to work directly with values up to 4,294,967,296. Most also support 64-bit operation (when used with a 64-bit operating system), for values up to 18,446,744,073,709,551,656. This means they can work with huge numbers at full speed with perfect accuracy. It also means a 64-bit operating system can support millions of terabytes of memory, while 32-bit systems are limited to 4GB (of which, in practice, only around 3.5GB is usable in Windows).

In addition to regular calculator operations, CPUs also support “extensions” that accelerate certain types of task. The SSE extensions found in every modern processor are a good example: the acronym stands for Streaming SIMD Extensions, with SIMD in turn standing for Single Instruction, Multiple Data. In practice, one SSE instruction can make the processor tear through a data set applying a single operation – such as addition or subtraction – to every data element in the set in a fraction of the time it would take to process each element individually. Implementing these special instructions involves considerable engineering work at the design stage, but once the feature is there it can give an enormous boost to tasks such as video processing and data compression.

Another sort of extension that’s commonly found in modern processors is hardware virtualisation. Conventionally, virtualisation software acts as a middle man between the virtual environment and the real hardware resources, which can slow things down considerably. Virtualisation extensions allow code running in the virtual machine to execute directly on the processor, at full speed – but instructions that can’t be executed natively are automatically trapped, so they can be handled by the virtualisation host software.

Extensions can also be used to provide security. Intel’s 2010 range of Core i3, i5 and i7 processors brought a new set of extensions called AES-NI, which enable the processor to encrypt and decrypt data using the industry-standard AES encryption algorithm at accelerated speeds. Intel’s TXT (Trusted Execution Technology) prevents programs from carrying out potentially dangerous activities, such as modifying resources that are being used by another process or snooping on the keyboard and mouse. An extension called Data Execution Prevention prevents the processor from running code that wasn’t loaded into memory as such: this makes it harder for viruses and hacker attacks to sneak onto the system.

With all these extensions and features, the modern CPU is far more than a simple calculator. In fact, it’s a miracle of engineering. When you reflect that the microprocessor was invented only a short half-century ago, it’s incredible to think what’s been accomplished… and there’s no sign of that progress slowing down any time soon.

Steppings

There are many different types of processor on the market. Some are designed for servers and workstations; some are designed for netbooks and low-power laptops.

But within a given family, it’s likely that most models will use the same core design. For example, consider AMD’s Athlon II X4 640, Phenom II X4 940 and Phenom II X4 980 processors. They perform quite differently, but that’s thanks to differing amounts of L3 cache and differing clock speeds. The physical arrangement of transistors inside the different models is functionally identical.

This approach makes commercial sense, because setting up a photolithographic process isn’t cheap. But it means that if any bugs sneak into the process, they’ll affect the entire processor family. AMD knows this to its cost: when it launched the first Phenom processors back in 2007, every model suffered from a serious bug that reduced performance by at least 10%.
In such cases, the only thing to do is to revise the design and introduce what’s called a new “stepping”. In this instance, the faulty chips were made from a design that had the stepping code B2; some months later, when AMD started to produce chips from an updated design, this was known as the B3 stepping. (The first testing design of a processor is commonly given the stepping A0, with subsequent major changes represented by a new letter and minor changes by incrementing the number.)

Even when there isn’t a problem with a chip, it’s common for processors to go through a few steppings over their lifetimes. Intel’s first range of Core i7 processors, released in 2008, had the stepping code C0; but if you’d bought one of the same models a year later, it would have been a D0 stepping. No major changes were announced, but overclockers found D0 models more stable at very high speeds. Presumably Intel found a way to simplify the design, making it more stable and increasing the proportion of chips that come out of the manufacturing process in perfect working order – the “yield”, in industry jargon.

SpeedStep and Turbo Boost

We traditionally think of a processor as running at a set frequency, but for more than a decade almost all CPUs have had the ability to dynamically adjust their clock speeds. Intel’s SpeedStep technology, first introduced with the Pentium III processor, detects when the processor isn’t being used to its full capacity and automatically turns the speed down. When the processor is taxed, the speed immediately ramps up again. This lets the computer deliver full performance when the user wants it, while generating less heat and consuming less power when the user is doing something passive, such as reading a web page. AMD processors have had the same ability for just as long, under the name PowerNow! for mobile chips and Cool’n’Quiet for desktop processors.

With the introduction of the Core i7 in 2008, Intel extended the principle to run in the other direction as well, with a new feature called Turbo Boost. It took advantage of the fact that quad-core i7 chips were designed with a TDP to accommodate all four cores being used simultaneously. When only one or two cores were in use, it was possible to automatically increase the speed of those cores while remaining within the overall heat and power budget.

In modern Core i5 and i7 processors, Turbo Boost is more aggressive, and can kick in even when all four cores are active. And AMD has followed suit, with the similarly-named Turbo Core system appearing in the six-core Phenom II X6 range, and more recently on some models of its A-Series processors. So when you next buy, say, a 2GHz processor, it’s likely it will in fact spend most of its time running well below that speed – and a fair proportion running well above it.

Source:http://www.pcauthority.com.au/Feature/290164,what-goes-on-inside-the-cpu.aspx

Researchers boost processor performance by getting CPU and GPU to collaborate

February 8th, 2012

Researchers at North Carolina State University have developed a technique to take advantage of the “fused architecture” emerging on multicore CPUs that puts central processing units and graphics processing units on the same chip. The technology, called CPU-assisted general purpose computation on graphics processor units (CPU-assisted GPGPU) uses software compiled to leverage the architecture to allow the CPU and GPU to collaborate on computing tasks, boosting processor performance on average by more than 20 percent in simulations.

The approach, outlined in a paper by NC State Associate Professor of Electrical and Computer Engineering Dr. Huiyang Zhou, Ph.D. candidates Yi Yang and Ping Xiang, and AMD GPU Architect Mike Mantor, is designed for fused architecture chipsets with a shared L3 cache and shared off-chip memory for CPUs and GPUs. The approach developed by the team leverages the computing power of the GPU, while taking advantage of the CPU’s more flexible data retrieval and better handling of complex tasks.

The current generation of hybrid CPU/GPU systems, including Intel’s “Sandy Bridge” and AMD’s “Llano” has helped create more energy-efficient systems and reduce manufacturing costs, Zhou said. “However, the CPU cores and GPU cores still work almost exclusively on separate functions. They rarely collaborate to execute any given program, so they aren’t as efficient as they could be. That’s the issue we’re trying to resolve.”

GPUs are obviously designed for handling graphics, but they are also very good at handling large numbers of parallel processes, particularly in applications where the same process needs to be applied to large amounts of data. Traditionally, one of the the biggest problems when using GPUs for general purpose computing has been that they don’t handle complex, branchy, pointer-heavy code very well at all—which is the strength of CPUs. The long pipelines of most GPUs instead favor sequential, streaming reads, and applications where there’s a high ratio of arithmetic operations applied to data relative to the amount of data that has to be moved to and from memory. Hybrid chips like Sandy Bridge have less main memory bandwidth than typical discrete GPUs (albeit with lower latency), so keeping the fast level 3 cache filled with data is essential if developers want to avoid starving the GPU of data.

CPU-assisted GPGPU uses the CPU’s faster L3 cache pre-fetching to feed data to the GPU, cutting out performance drags that come with GPU code accessing memory. A program compiled for CPU-assisted GPGPU launches a “pre-execution” program at startup on the CPU to pre-fetch data to be processed by GPU code and load it into the level 3 cache onboard the chip. That allows process threads running in the GPU to hit the L3 cache directly, rather than fetching from memory, reducing latency and significantly boosting performance. In some cases, the performance of simulated applications improved by up to 113%, the researchers claimed.

Why simulated? AMD’s current hybrid processor, the Llano, lacks a shared L3 cache, so it won’t support the approach. And Intel’s Sandy Bridge offers only limited GPGPU functionality. In a phone interview with Ars, Dr. Zhou explained that in theory the research could be applied to Intel’s current Sandy Bridge architecture, which provides a shared last-level cache for CPUs and GPUs in its architecture. But he said that Sandy Bridge’s GPU “isn’t that powerful” and Intel’s current software support “doesn’t include support for OpenCL and other GPGPL stuff.” However, he said, he expects that the hardware support for CPU-assisted GPGPU applications will be in upcoming generations of hybrid platforms from both Intel and AMD, and software support will follow. And, he added, “it’s already assumed that the GPU (in Intel’s Ivy Bridge processors) will be much more powerful than Sandy Bridge.”

Real World Technologies editor David Kanter said that he expects to see “a lot more work in this area, as engineers and researchers must improve performance significantly, while maintaining or reducing power consumption.” But he noted that there wasn’t information in the research about the power consumption impact of the technology. Zhou said that the research hasn’t yielded any hard numbers on what the power consumption impact would be.

Zhou said that his team’s research had been funded by grants from the National Science Foundation and AMD, and was just the latest collaboration with Mantor. But the research up until now has been fundamental scientific research, and he couldn’t say how it might be commercialized by AMD or Intel.

Source:http://arstechnica.com/business/news/2012/02/researchers-boost-processor-performance-by-getting-cpu-and-gpu-to-collaborate.ars

LaCie Announces 5big Office Series, Powered By Windows Home Server 2011

February 2nd, 2012

After all these years, LaCie’s storage hardware still looks better than storage hardware from pretty much any other vendor. And if you’re starting up a small business, you may be interested in what the company’s got on offer now. This week, LaCie announced the 5big Office Series, its latest five-bay network attached storage solutions. The 5big Office Series, powered by Windows Home Server 2011, packs a 1.6GHz 64-bit Atom CPU, 2GB of RAM and plenty of options. There’s a single-drive edition that allows users to add additional drives, as needed, for up to 10TB of total capacity. It reduces the initial investment and assures a long-term solution. What’s more, its PC backup data deduplication only backs up a single instance of redundant data. This optimizes storage space as well as increases backup speed.

The 5big Office+ offers all the features of the 5big Office, but brings powerful tools – including DFS-R/N – for integration with offsite servers. IT managers can centralize and consolidate data, even when it’s spread across multiple geographic locations. The 5big Office+ also features Windows domain/Active Directory support, a faster dual-core processor, dual Ethernet links, and can back up more PCs. The LaCie 5big Office is available in single-disk 2TB capacity and the 5big Office+ is available in single-disk 2TB or five-disk 10TB capacities through the LaCie Online Store and LaCie Storage Partners starting at $749.00. Eager to learn more? Have a look at the video below.

Source:http://hothardware.com/News/LaCie-Announces-5big-Office-Series-Powered-By-Windows-Home-Server-2011/

Sony Offers New CPU Options For S Series And E Series Laptops

January 24th, 2012

Sony announced new CPU options for its S Series of laptops. The 13-inch and 15-inch S Series models now offer the newest generation of Intel Core processors with Core i7 options. The S Series features Hybrid Graphics (AMD Radeon HD 6470M or AMD Radeon HD 6630M), optical drives (including Blu-ray Disc options), backlit keyboards, options for solid state drives, and QUAD RAID 0 technology. The 13-inch S Series starts at $799 while the 15-inch models start at $979. Both 13- and 15-inch models will be available in early February.

Sony also refreshed its E Series with the latest 2nd generation Intel Core processors and four color options. The E Series is available in 14- and 15-inch sizes and starts at $499 and $459, respectively. In addition to refreshing the S Series line with new CPU options, Sony also added a Carbon Silver color option for the VAIO Z Series laptop.

Sony today announces the availability of a new Carbon Silver color for the VAIO® Z Series laptop and a variety of updated models with new CPU options and enhancements.

New Color and technology for the Z Series

The updated Z Series with the new option of Carbon Silver is added to the existing choices of Carbon Black, Carbon Gold, and Premium Carbon Black. Also available is optional LTE mobile broadband built-in, supporting 4G data service. With the latest 2nd generation Intel® processors ranging from Intel Core™ i5 and higher and RAID 0 solid state drives, the Z Series continues Sony’s efforts to offer users advanced performance and design that fits their mobile lifestyles. Starting at $1949.99, the Z Series includes the Power Media Dock™ drive, ideal for the business user in need of additional ports or external displays.

New CPU’s for the S Series Laptops

The S Series continues to offer performance mobility and all around excellence since its debut last year. The 13” and 15” inch models feature everything students and performance minded users need including standard voltage processors, Hybrid Graphics, optical drives (Blu-ray Disc™ options as well), backlit keyboards and the VAIO, ASSIT and WEB hardware buttons for launching Media Gallery™ software, VAIO Care™ support software and access to the web without full boot-up Windows®, all with a touch of a button. In addition, options for solid state drives on both the 13” and 15” S Series laptops include 256GB, 512GB, and 1TB and feature QUAD RAID 0 technology for enhanced performance.

With the newest generation of Intel® Core™ processors now available, the S Series features updated CPU’s on the 13” and 15” models both available with up to Core i7, delivering even more performance automatically when users need it most.

Hybrid Graphics and IPS Technology for S Series

The S Series includes Hybrid Graphics with either an AMD Radeon HD 6470M (512MB VRAM) or AMD Radeon HD 6630M (1GB VRAM), providing flexibility between performance and maximum battery life. The 15” will come standard with a 15.5” Full HD display (1920 x 1080) with IPS technology for improved image quality and viewing angles.

When coupled with the advanced large-capacity optional sheet battery, users can stay mobile and unplugged for up to 12 hours while also offering a thin battery profile. Intelligent charging enables VAIO® S Series users to charge the optional sheet battery separately from the PC and attach it to the system at any time without shutting down for maximum flexibility.

The 13” S Series will start at $799. The 15” S Series will start at $979 and includes the Full HD display and Intel Core i5 processor. Both 13” and 15” models will be available starting early February.

New CPU’s for E Series

Ideal for students and everyday users, the refreshed E Series is now available with the latest 2nd generation Intel Core processors and in four colors including Glacier White, Charcoal Black, Midnight Blue and Blush Pink with a unique textured design. Available in 14” and 15” inch sizes, the E Series also features Intel® Wireless Display for select models, VAIO, ASSIST and WEB hardware buttons, optional keyboard skins and optional dedicated NVIDIA® graphics with up to 1GB VRAM. Pricing starts at $499 for the 14” series and $459 for the 15” series.

Software updates to all Series

In addition to a new Z Series color and refreshed CPUs, the Sony VAIO team also updated some software. With the updated Media Gallery™ 2.0, users will experience a new look and feel including new features for popular social networking services. The S Series will come with the update already installed and is available for users to download for other models. Also included is the Music Unlimited promotion, providing 180 days of Music Unlimited basic service for free for first time users. The F Series laptop and L Series All-in-One come preloaded with Sony Imagination Studio™ Multimedia Edition, a collection of audio and video editing software, including the award winning Sony Vegas® Movie Studio HD.

Source:http://hothardware.com/News/Sony-Offers-New-CPU-Options-For-S-Series-And-E-Series-Laptops/

CPU Startup Combines CPU+DRAM—And A Whole Bunch Of Crazy

January 23rd, 2012

The CPU design firm Venray Technology announced a new product design this week that it claims can deliver enormous performance benefits by combining CPU and DRAM on to a single piece of silicon. We spent some time earlier this fall discussing the new TOMI (Thread Optimized Multiprocessor) with company CTO Russell Fish, but while the idea is interesting; its presentation is marred by crazy conceptualizing and deeply suspect analytics.

The Multicore Problem:

There are three limiting factors, or walls, that limit the scaling of modern microprocessors. First, there’s the memory wall, defined as the gap between the CPU and DRAM clock speed. Second, there’s the ILP (Instruction Level Parallelism) wall, which refers to the difficulty of decoding enough instructions per clock cycle to keep a core completely busy. Finally, there’s the power wall–the faster a CPU is and the more cores it has, the more power it consumes.

Attempting to compensate for one wall often risks running afoul of the other two. Adding more cache to decrease the impact of the CPU/DRAM speed discrepancy adds die complexity and draws more power, as does raising CPU clock speed. Combined, the three walls are a set of fundamental constraints–improving architectural efficiency and moving to a smaller process technology may make the room a bit bigger, but they don’t remove the walls themselves.

TOMI attempts to redefine the problem by building a very different type of microprocessor. The TOMI Borealis is built using the same transistor structures as conventional DRAM; the chip trades clock speed and performance for ultra-low low leakage. Its design is, by necessity, extremely simple. Not counting the cache, TOMI is a 22,000 transistor design, as compared to 30,000 transistors for the original ARM2. The company’s early prototypes, built on legacy DRAM technology, ran at 500MHz on a 110nm process.

Instead of surrounding a CPU core with a substantial amount of L2 and L3 cache, Venray inserted a CPU core directly into a DRAM design. A TOMI Borealis core connects eight TOMI cores to a 1Gbit DRAM with a total of 16 ICs per 2GB DIMM. This works out to a total of 128 processor cores per DIMM. Because they’re built using ultra-low-leakage processes and are so small, such cores cost very little to build and consume vanishingly small amounts of power (Venray claims power consumption is as low as 23mW per core at 500MHz).

It’s an interesting idea.

The Bad:

When your CPU has fewer transistors than an architecture that debuted in 1986, it’s a good chance that you left a few things out–like an FPU, branch prediction, pipelining, or any form of speculative execution. Venray may have created a chip with power consumption an order of magnitude lower than anything ARM builds and more memory bandwidth than Intel’s highest-end Xeons, but it’s an ultra-specialized, ultra-lightweight core that trades 25 years of flexibility and performance for scads of memory bandwidth.

The last few years have seen a dramatic surge in the number of low-power, many-core architectures being floated as the potential future of computing, but Venray’s approach relies on the manufacturing expertise of companies who have no experience in building microprocessors and don’t normally serve as foundries. This imposes fundamental restrictions on the CPU’s ability to scale; DRAM is manufactured using a three layer mask rather than the 10-12 layers Intel and AMD use for their CPUs. Venray already acknowledges that these conditions imposed substantial limitations on the original TOMI design.

Of course, there’s still a chance that the TOMI uarch could be effective in certain bandwidth-hungry scenarios–but that’s where the Venray Crazy Train goes flying off the track.

Let’s start here. In a graph like this, you expect the two bars to represent the same systems being compared across three different characteristics. That’s not the case. When we spoke to Russell Fish in late November, he pointed us to this publicly available document and claimed that the results came from a customer with 384 2.1GHz Xeons. There’s no such thing as an S5620 Xeon and even if we grant that he meant the E5620 CPU, that’s a 2.4GHz chip.

The “Power consumption” graphs show Oracle’s maximum power consumption for a system with 10x Xeon E7-8870s, 168 dedicated SQL processors, 5.3TB (yes, TB) of Flash and 15x 10,000 RPM hard drives. It’s not only a worst-case figure, it’s a figure utterly unrelated to the workload shown in the Performance comparison. Furthermore, given that each Xeon E7-8870 has a 130W TDP, ten of them only come out to 1.3kW–Oracle’s 17.7kW figure means that the overwhelming majority of the cabinet’s power consumption is driven by components other than its CPUs.

From here, things rapidly get worse. Fish makes his points about power walls by referring to unverified claims that prototype 90nm Tejas chips drew 150W at 2.8GHz back in 2004. That’s like arguing that Ford can’t build a decent car because the Edsel sucked.

After reading about the technology, you might think Venray was planning to market a small chip to high-end HPC niche markets… and you’d be wrong. The company expects the following to occur as a result of this revolutionary architecture (organized by least-to-most creepy):

* Computer speech will be so common that devices will talk to other devices in the presence of their users.
* Your cell phone camera will recognize the face of anyone it sees and scan the computer cloud for backround red flags as well as six degrees of separation
* Common commands will be reduced to short verbal cues like clicking your tongue or sucking your lips
* Your personal history will be displayed for one and all to see…women will create search engines to find eligible, prosperous men. Men will create search engines to qualify women. Criminals will find their jobs much more difficult because their history will be immediately known to anyone who encounters them.
* TOMI Technology will be built on flash memories creating the elemental unit of a learning machine… the machines will be able to self organize, build robust communicating structures, and collaborate to perform tasks.
* A disposable diaper company will give away TOMI enabled teddy bears that teach reading and arithmetic. It will be able to identify specific children… and from time to time remind Mom to buy a product. The bear will also diagnose a raspy throat, a cough, or runny nose.

Conclusion:

Fish has spent decades in the microprocessor industry–he invented the first CPU to use a clock multiplier in conjunction with Chuck H. Moore–but his vision of the future is crazy enough to scare mad dogs and Englishmen.

His idea for a CPU architecture is interesting, even underneath the obfuscation and false representation, but too practically limited to ever take off. Google, an enthusiastic and dedicated proponent of energy efficient, multi-core research said it best in a paper titled “Brawny cores still beat wimpy cores, most of the time.”

“Once a chip’s single-core performance lags by more than a factor to two or so behind the higher end of current-generation commodity processors, making a business case for switching to the wimpy system becomes increasingly difficult… So go forth and multiply your cores, but do it in moderation, or the sea of wimpy cores will stick to your programmers’ boots like clay.”

Source:http://hothardware.com/News/CPU-Startup-Combines-CPUDRAMAnd-A-Whole-Bunch-Of-Crazy/

Intel Scrapping Two Dozen CPUs to Make Room for Ivy Bridge

January 6th, 2012

It’s out with the old and in with the new over at Intel. According to reports, the Santa Clara chip maker is letting its hardware partners know of a revised schedule to halt production and/or stop the supply of more than 25 existing desktop processor models so that it can focus on and make room for its upcoming 22nm Ivy Bridge launch.

Speaking with industry sources in Taiwan, DigiTimes says Intel is suspending the supply of Core i5 660 and 661, Core i3 530, Pentium E5700, and Celeron E3500 processors and will cut off production in the second quarter of 2012.

In addition, Intel is putting the brakes on production lines churning out Core i7 860S and 876K, Core i5 655K, 750S, and 760, and Celeron 430 and 450 processors sometime this quarter, followed by Core i7 870S and 880S, Core Duo E7500 and E7600, Pentium E550 and E6600, and Celeron E3300 in Q2 2012.

The latest word on Ivy Bridge is that Intel will launch more than a dozen desktop and mobile variants on April 8. Desktop Ivy Bridge processors will initially be priced between $184 and $332, while half a dozen mobile chips will cost up to $1,100 (Core i7 3920QM).

Source:http://hothardware.com/News/Intel-Scrapping-Two-Dozen-CPUs-to-Make-Room-for-Ivy-Bridge/

Get Adobe Flash playerPlugin by wpburn.com wordpress themes