Emulation is now the cornerstone of verification for advanced chip designs, but how emulation will evolve to meet future demands involving increasingly dense, complex, and heterogeneous architectures isn’t entirely clear.
EDA companies have been investing heavily in emulation, increasing capacity, boosting performance, and adding new capabilities. Now the big question is how else they can leverage this technology as design needs shift. Design has remained at the register transfer level (RTL) for 30 years, and simulator performance stalled out about 20 years ago. That gap increasingly has been filled by emulators, whose performance is almost independent of design size, thanks to technology advances that keep pace with increasing design sizes.
“As a verification industry, we keep doing what we used to do,” says Simon Davidmann, CEO for Imperas Software. “Things scale up, computers get more powerful, we add more memory, and then at some point, something breaks. We can’t keep doing it that way. We have to move to the next level of whatever it is.”
None of this is a surprise to EDA companies. “This is our bread and butter,” says Johannes Stahl, senior director of product marketing for emulation at Synopsys. “We think about this every single day and look at how things are going to evolve. We look to our customer needs all the time.”
All of the big emulator vendors point to similar trends. “Each engine that we have, whether it is logic simulation, emulation, or field-programmable gate array (FPGA) prototyping platforms — the dynamic engines — each has its own sweet spot,” says Michael Young, director of product marketing for Cadence. “As such, it’s really about the verification task at hand and what we can do to maximize its advantage. And when I say maximize advantage, it really means, how is it going to be practical, what is going to be available for you to use, and that comes back to economics.”
Without increases in capacity and performance, emulation quickly would have followed RTL simulation into decline. But commercial emulators have significantly different architectures. Some are based on custom chips, while others use off-the-shelf FPGA devices. Some have programmable fabrics and lookup tables, while others are processor bases. Each of them creates different tradeoffs between cost, performance, and turn-around time.
“FPGAs are evolving with the silicon curves and going down in the nodes,” says Synopsys’ Stahl. “That enables us to concentrate on the hardware architecture. What do you do with those FPGAs? How do you optimize scaling using FPGAs? Building capacity fundamentally means deciding upon the form factor for having hundreds of FPGAs, and how you interconnect them in the most efficient way. We have a variety of different architectures in the market today that achieve different performance points. This innovation will continue.”
To the layperson it may look as if emulators never get any faster. “If you look at the same designs, say a billion-gate design that was running on the previous emulation system versus the latest generations, we’re actually seeing about 50% performance improvement,” says Cadence’s Young. “That’s the good news. Instead of running at 900 kilohertz or 1 megahertz, now you’ve seen around 1.4 to 1.5 megahertz. It is incrementally improving. But then if you have a 2 billion-gate design, well maybe it comes back down to a speedup factor of 1.1 because of the communication. The limiters are things like the routing mechanism, the interconnect. As you scale, the pathways become longer, and emulation speed is limited by the longest path.”
Fig. 1: Palladium Z2, Cadence’s emulator. Source: Cadence Design Systems
Then there is the software used to map a design into the emulator. “How do you best map the design onto these FPGAs?” asks Stahl. “That effort is ongoing, and exploiting the architecture of the emulator and then the specifics of the design. There is a constant stream of innovation on the software side, and those eventually manifest themselves in the next-generation emulators.”
Others agree. “We invest a lot in innovation in our parallel partitioning compiler then follow on with what we call modular compiler,” says Young. “If we just do the traditional compile, it would take days to compile, which is not practical for most customers. Most customers expect two to three turns per day. As an example, we have a customer that’s doing a 6 billion-gate design using these advanced compilers, and they’re able to get the compile done in less than six hours.”
All of this is happening in the face of changing use models for emulation. “When the world started with emulation, it was really about in-circuit emulation, where your emulators were cabled up to a custom board or off-the-shelf board,” explains Young. “You may have a processor on the motherboard, and you’re designing a GPU on the emulator. Most people use PCIe as the primary interface to the processor, so that would be one way. Over time, people have virtualized that interface so that instead of using speed bridges that are connected through the PCI, you have virtual bridges. There’s another class of use model that is simulation acceleration. Those are simulation folks that want to take very long runs and use hardware to accelerate that. Now, there are even more exotic things like power estimation.”
In fact, Cadence says it is tracking 20 different use models for emulation, and it expects that number to grow to 30 or more in the next year or two.
The impact of software
One big change permeating the industry is the growing importance of software. “As we enter the new semiconductor mega-cycle, the era of software-centric SoC design requires a dramatic change in functional verification systems to address new requirements,” said Ravi Subramanian, senior vice president and general manager for Siemens EDA. “Our customers require a complete integrated system with a clear roadmap for the next decade that spans virtual platforms, hardware emulation, and (FPGA) prototyping technologies.”
Imperas’ Davidmann agrees. “The world has moved from a chip-performance, chip design-centric world, to a software-defined world. People don’t specify absolute hardware performance, but they do want the fastest implementation that they can get. What the software is saying is that for me to run in a realistic time, I need ‘this’ class of hardware. They do not need detailed timing analysis. If a job is not getting done, you split it into two of the available cores.”
Early software verification has become one of the most important tasks entrusted to emulation. “You will not run just a small piece of software, you will run an entire workload,” says Stahl. “It could be an Android boot with some phone application, it could be a 5G protocol, it could be a networking application for lots and lots of Ethernet packets. But all of them are real application workloads, and are the complete software stack that runs on the chip.”
This can happen at multiple levels of abstraction. “Customers may want to run market-specific, real-world workloads, frameworks, and benchmarks early in the verification cycle for power and performance analysis,” says Siemens’ Subramanian. “This would be done using virtual SoC models early in the cycle. Later, move the same design to an emulator to validate the software/hardware interfaces and execute application-level software while running closer to actual system speeds. To make this approach as efficient as possible, they need to use the same virtual verification environment, the same transactors and models, to maximize the reuse of verification collateral, environment and test content.”
Using different abstraction models at various stages adds additional requirements to the flow. “By adopting a platform-based hardware design, the hardware might become less of a risk in system design,” says Christoph Sohrmann, group manager for virtual system development at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “It’s more the interaction of the software with the peripherals. Most of the functionality comes from the software layers. Here we will see a much steeper increase in complexity, where the hardware/software interaction will become tricky to verify. What is needed in this area will be properly qualified, fast system-level models regarding the timing behavior, for instance. One future trend will be the development and provisioning of system-level IP that has been qualified against their RTL counterpart.”
At some point cycle accuracy may become necessary. “There are other pieces of software, like drivers or other low-level embedded software like DMA, where timing matters,” says Young. “That is when performance analysis and tradeoffs are being looked at. They need to be using RTL models, because the virtual model is abstracting away necessary information. For good performance analysis you need cycle-level accuracy.”
Is RTL too late?
As more software verification can be performed on virtual models, is RTL too late for a lot of verification tasks? “RTL is a very specific representation of a design, which is accurate enough for performance analysis and to run software that does require cycle accuracy,” says Stahl. “So it’s still the most important representation of the design, but it’s not the only one. If you don’t have representations that are earlier than RTL, you cannot win in the market. The best examples of that are chips that have vastly new architectures. With every AI chip having a different architecture, every one of these companies is coming up with a high-level architecture model that represents their special processing cores, with which they investigate performance at some level of accuracy prior to RTL.”
Risk reduction means that people are reluctant to change. “The huge challenge that people face is having to move away from the comfort of the detail that they used to look at,” says Davidmann. “Almost all software can be verified using an instruction-accurate model, and then try small bits on the RTL. Chip architectures are changing. You have abstractions in the communications so that it all works at a higher level, which means you can do it in software and simulate it much better without the detail. For these new architectures, you can simulate them without worrying about the details of communication because that can be abstracted.”
Identifying the important bits requires using hybrid solutions. “What we are offering are cycle-accurate models for the interconnect and the memory subsystem, because these two together determine how many cycles you can pump through your design,” says Stahl. “Everything else can be behavioral. It may be abstract traffic. Maybe it’s an abstract processor model. Maybe it only models a certain sequence of events. But this sequence of events, and the intensity of the traffic, is good enough to do this tradeoff of the architecture. Nobody makes the mistake of equating this to the real world. And nobody questioned at that level that it needs to be 5% accurate. It’s probably not. It’s perhaps 15% accuracy or 20% accurate. Relative tradeoffs are still worthwhile.”
Fig. 02: The Synopsys ZeBu EP1 emulation system. Source: Synopsys
Mixing virtual prototyping and emulation has become a must. “Hybrid emulation in conjunction with virtual prototypes can deliver a significant speed-up of the verification process and solve the curse of complexity in the future,” says Fraunhofer’s Sohrmann. “Basically, there are two concepts that could be considered. One is the classical co-simulation approach. For this to work smoothly, a standardized interface is required between the models of different abstraction level. Some parts of the RTL can be replaced with faster models to speed up the overall simulation. The second and more challenging concept is to dynamically switch between abstraction levels during the simulation. For instance, you want to verify some internal state the system is in after a lengthy boot process. That would take forever when run on RTL directly. You might instead consider running the initialization on a virtual prototype and dynamically switch to a more detailed model at some point during your simulation.”
This dynamic abstraction approach is taking hold. “We have one customer that is using our software simulation of the processors to boot up the system to an appropriate point, then hot-swapping into the RTL on the emulator in a hybrid mode,” says Davidmann. “We gear shifted from running at 500 million instructions per second into the emulator, which was running about 100 times slower once they got to the interesting point.”
Prototyping has been gaining traction recently as a means of solving other verification challenges. “Prototyping systems enable you to achieve much higher performance than emulation,” says Stahl. “But it takes more time to build these prototypes. As opposed to an emulator, they are typically manually wired for optimal performance, and also more tweaked in terms of the flow. But you can achieve 20 MHz or 30 MHz, which may be required to perform certain functions.”
Is there value in bringing prototyping and emulation together? “The challenge here is not really about putting these systems together,” says Young. “The challenge is when you have an emulation platform with a database that works, the time it takes to bring it up on an FPGA prototyping platform. We have put a lot of effort to have a very smooth transition between them. Once you get it running on emulation, it only takes weeks to bring it up on the FPGA prototyping platform, sometime within a week. So now you have an advantage of where you have a congruency between the two platforms. And when you need higher performance to run more software workload, you can do that rather quickly, rather than waiting for months.”
But there are other changes happening in the market. Chiplets are beginning to gain a lot of attention, and these may be delivered both as final product and something that resembles a prototype. “You can call them chiplet, or in the early days we called them custom boards or target boards,” adds Young. “It is absolutely where the markets are going. Instead of waiting for my final silicon, I can deliver an emulation platform with custom target environments, where the target is basically anything outside of the emulator in a physical form. That’s already happening. Customers are designing either custom memories around that, or chiplets, whatever is needed to create the models that they can demonstrate this is what my design looks like. This is what the capabilities and features and performance could be in the final silicon. It could be a platform for their system customers to do early software development or integrations.”
One of the problems with emulation is that it generates so much data that it can be difficult to analyze all of it. To this end, emulation makers are beginning to embed analysis tools within the hardware, vastly reducing the amount of data that has to be exported. We have seen this for coverage analysis as well as various forms of power analysis.
A new reported use case enables performance measurement on a prototype. “Arm has presented using prototyping, not only for software development or system validation of the interfaces, but using prototyping for performance measurement,” says Stahl. “They have found a way to calibrate the results. Previously, the belief in the industry was that you can only measure performance on an emulator because the emulator is the only one that is cycle-accurate. The prototype is not necessarily cycle-accurate. But what they found is there’s another uncertainty. When you don’t run enough benchmarks, you have variations in performance between benchmarks. If you just run a benchmark once, you might not get the correct result. You have to average between running the same benchmark multiple times because there are dependencies on the software to execute the benchmark.”
Emulation has changed many times during its history. It has proven itself again and again, and development teams have adapted the hardware to fit the needs of chipmakers. The big cloud on the horizon for emulators is how much architectures are changing and the impact this will have on verification in general. If an increasing amount of verification needs to be done before RTL, the industry will find ways around the need for cycle accuracy.
Other approaches have gained ground. The virtual prototype, in particular, has become an indispensable tool for architectural exploration, performance analysis, and early software verification. But no matter what happens, the hardware that exists in a chip will have to be verified, and that takes a huge amount of capability. And because that hardware has to be driven by some software, there may always be a role for the two to execute together in a representation that is as close to production as possible.
- Follow on
- real world
- software development
- The Future
- the world
- Vice President
- What is