High-Level Synthesis For RISC-V

Republished By Plato

Followers: 0

High-quality RISC-V implementations are becoming more numerous, but it is the extensibility of the architecture that is driving a lot of design activity. The challenge is designing and implementing custom processors without having to re-implement them every time at the register transfer level (RTL).

There are two types of high-level synthesis (HLS) that need to be considered. The first is generic HLS, which takes a description in C, C++, or SystemC and turns it into RTL. These tools enable you to explore the design space to create an optimal architecture, and this has worked exceedingly well for algorithms that are dataflow-oriented. In fact, the tools have become better at handling control-oriented constructs over time. But can they be used to implement processors? And is there a better way?

The other type of HLS is tools dedicated to processors. Extensible processors are not new. Tensilica (Cadence) and ARC (Synopsys) have been enabling users to create customizable processors for a couple of decades, and Arm recently dipped its toe in the water. These tools start with an Architectural Description Language that is designed specifically for processors. The question now is whether these two synthesis technologies can be brought together.

The answer isn’t entirely clear. HLS may not be applicable to all RISC-V designs. “Are you interested in a standard RISC-V implementation, or are you going for specialization?” asks Gert Goossens, senior director for ASIP tools at Synopsys. “If you’re interested in a standard RISC-V implementation, then I’m not sure HLS will provide you much benefit, because you can describe the architecture in RTL and optimize it there. But as soon as you start to look at instruction set extensions it is a different story. Then the question is which extensions are the best for my application domain, and architectural exploration becomes really crucial.”

There are different levels of architectural optimization. “All HLS tools enable you to do architectural exploration,” says Zdeněk Přikryl, CTO for Codasip. “Generic HLS solutions are good for doing this for data paths and statically scheduled algorithms, such as image processing. If you are designing a CPU, it is different. We can do much more exploration than you could do at RTL or conventional HLS. In one recent example, we started with the RISC-V baseline and then played with different combinations of extensions. Then we explored custom extension and defined some new instructions. The tests were rerun and the speed-up was more than 50X.”

“This is the confluence of two technologies that has the potential to be truly transformative in our industry,” says Rob Knoth, product management director at Cadence. “Processor design isn’t just about maximizing your GHz. You’re talking about processors that could be targeted for a very low area. Some could be targeted for extremely low power, some could be targeted at legacy nodes. Processors aren’t vanilla. Processors come in 31 flavors, especially when you start thinking about RISC-V, where it is open and customizable. Application-specific processors need tools that are predictable, and they need a tight coupling to the physical world.”

Assessing extensions is a different level of architectural optimization that involves software. “You need to figure out the ‘right’ custom instructions for the specific target application so that certain goals can be achieved, be it memory, power, performance, or area,” says Sven Beyer, product manager for OneSpin, a Siemens Business. “To do that, high-level models are needed, like virtual prototypes with custom instructions, running software to assess memory usage or performance. Once the custom extension candidates are identified, they need to be implemented in RTL to allow for the final assessment of key performance indicators (KPI). This task is very tedious when manually writing RTL, even if just adding to an existing RTL core, and manually writing independent prototypes. Keeping the models in sync is challenging.”

More than just hardware
Processors are the boundary between hardware and software. They affect each other. “When we look at the RISC-V landscape, the IP landscape, we see a lot of focus on the hardware,” says Patrick Verbist, product marketing manager for Synopsys. “Commercial companies and universities are offering different IPs, often putting RTL implementations in the public domain. There is a forest of solutions, each with a strong focus on their RTL implementation. They proclaim that one is better, or has a deeper pipeline, or higher performance. But then they say that for the compilation side, you need to rely on public domain tools. What if I want to extend RISC-V? Do those tools support my hardware implementation? There is too much focus on the RTL side of the IP implementations.”

There are several ways to design custom processors. For example, it could be with additional instructions, or by attaching a dedicated accelerator for a particular task. “HLS really helped blur the line about what is in the processor versus what is outside the processor, such as a dedicated accelerator,” says Cadence’s Knoth. “It lets that line be very flexible — both application-specific and technology-specific. Looking at portability, if you have a processor that is being targeted toward a 5nm node, it’s going to have very different care-abouts than something that is targeting a 22nm node. It is powerful to be able to have one consistent SystemC model, and then carve off certain pieces for a processor, versus certain pieces to be a hardware-based accelerator, to obtain an optimized implementation for the complete system.”

The key is being able to optimize the complete system, and that requires analysis of both hardware and software. “A high-level architecture description language, such as CodAL, enables us to capture the architecture with the instruction set, as well as the microarchitecture of the processors,” says Codasip’s Přikryl. “From the single description, in a language that is similar to C, we are able to generate an LLVM-based compiler, assembler, dissassembler, simulator, and the RTL, UVM verification environment, and other outputs.”

There are multiple types of optimizations. “I start with the architecture view, where I define a list of instructions and architectural resources,” adds Přikryl. “I know roughly the timing that I would like to have, but I don’t really care whether the load/store unit will use AHB or AXI, or how wait states are handled. These are low-level details that come later. Once I am happy with the architecture, I can start adding micro-architectural information, low level details. There is a lot of common ground in all processors. You need to fetch instruction from the memory, you have hazard handling, decoders, load/store unit – they just have to be there. The data path may be simple and handle just integer or floats, or it may be SIMD or scalable vector.”

Fig. 1: Tool flow from CodAL. Source: Codasip

Fig. 1: Tool flow from CodAL. Source: Codasip

Adding the microarchitecture leads to other types of optimizations. “We define two types of optimizations,” says Synopsys’ Goossens. “The first is compiler in the loop, and the second one is synthesis in the loop. Compiler in the loop means that when you’re exploring different choices for instructions extensions, you use the automatically generated compiler for each instance of your architecture. For each trial, you generate the compiler, and then you compile benchmark code. That code is run in the simulator, and you can profile what was generated and see where the bottlenecks are in your architecture. Then you go back to the processor model — in our case in the nML language — and you make changes.”

Fig. 2: Tool flow for ASIP Designer. Source: Synopsys

Fig. 2: Tool flow for ASIP Designer. Source: Synopsys

“The second is synthesis in the loop,” says Goossens. “From the same processor model we generate an RTL implementation and do logic synthesis. This will synthesize your generated Verilog code. You can see the critical path in the design and the clock frequency that you can achieve. You then can make decisions about micro-architectural details of your processor. You might adjust the number of pipeline stages. Maybe it has to be increased because there is a critical block, like a multiplier, that has to become pipelined. You go back to the nML model where there are constructs to describe the pipeline.”

When both processor extensions and dedicated accelerators are being used, a partitioning phase is necessary. “The micro-architecture parameter set and the custom extensions can be optimized for the target application by running simulation on the HLS models to understand the overall KPIs,” says OneSpin’s Beyer. “HLS tools can then be used to generate the RTL, with support for simple extensions inside the core and more advanced accelerators that are just loosely coupled. Thus, there is very low overhead to get to a precise RTL model with the HLS tooling and its support for custom extensions. You can then measure the KPIs on the generated RTL to finalize the choice of extensions and parameters.”

There are a lot of similarities in the commercial tools. “You look at ARC, Tensilica, Codasip, Andes — they all have a language for describing the instructions,” says Imperas CEO Simon Davidmann. “It is slightly more abstract than RTL, and their tools can push it into RTL. They also build a tool chain with LLVM. They often build a simulator, as well. One of the key issues is that you must be able to simulate directly in the high-level language because you have to be able to debug it. It is no good if you have to translate it into something else because then it just becomes too hard. Another important aspect of the tool chain is that when you make a small change, you don’t want everything under that to change. The notion of engineering change orders (ECO) is important.”

So many languages
Defining new processors was all the rage back in the ’80s and ’90s, but fell out of favor when the IP model became popular and a few companies managed to dominate the market with highly optimized processor cores. Now, there is a rebirth of sorts, and some companies have adopted the languages that were used in the past while others have adopted languages defined within the industry or developed new custom languages. They are all classified as Architectural Description Languages, or ADLs.

When RISC-V initially was designed, it used a new language called Chisel. It is a hardware construction language based on Scala, and while it is at a slightly higher level of abstraction than RTL, it is not considered to be a high-level synthesis language. SiFive uses this language to define its processors.

A research group at Columbia University in New York adopted SystemC and published a paper about it in 2020. The paper presented “HL5 as the ﬁrst 32-bit RISC-V microprocessor designed with SystemC and optimized with a commercial HLS tool. We evaluate HL5 through the execution of software programs on an experimental infrastructure that combines FPGA emulation with a standard RTL synthesis ﬂow for a commercial 32nm CMOS technology.”

So is SystemC the best choice? “SystemC has some significant advantages in terms of staying closer to the higher level of concepts than RTL,” says Knoth. “You need to stay close to the people who are doing the algorithm design, so they have the freedom and flexibility to experiment and to explore their ideas. That’s the best language. There is a huge amount of investment in the SystemC in terms of people using this for architectural exploration and verification. In addition, when strengthening the ecosystem with verification and connecting it closer into the implementation side, it becomes a better value proposition than pure RTL design.”

Many hardware/software tools have gravitated toward the C language. “The CodAL language is C-based and is understandable for everyone,” says Přikryl. “You don’t have objects or other things that you would find in SystemC or new programming paradigms, and it’s easier for engineers to adopt. SystemC is a simulation framework, and it’s hard to capture things such as how the compiler can leverage the instruction. You would have to have another layer anyhow, so it wouldn’t be pure SystemC. At the end of the day you end up with a domain-specific language.”

There are many legacy languages for processor specification. “nML is a high-level definition language for describing a processor architecture and instruction set (ISA), says Goossens. The advantage of nML over hardware description languages like SystemC or Chisel, is all the software tools can be generated next to the RTL implementation, and you keep software and hardware consistent with each other. So nML is the golden reference for both the hardware and the software implementation.”

The RISC-V community also wanted a formal way to define the processor. “They adopted SAIL, which came out of Cambridge University in the U.K.,” says Imperas’ Davidmann. “The holy grail in design is correct by construction. You describe something up front where you can verify it. Then you map it to the next level down and re-verify. Arm clearly has this complete automation from the architecture through to the documentation. They have a flow from an initial representation through to the documentation, through to the RTL, through to the reference models, and they use formal tools along the way. The goal is to have a language that is abstract and where everything flows down from it.”

Arm clearly has cracked this problem, which is a factor in its success. To get some insights into what Arm does, this blog provides some insight. It was written by Alastair Reid, who was at Arm for almost 15 years, and is now staff research scientist at Google.

Verification
As systems become more complex, the verification task tends to grow faster than the design task. That means all viable solutions must take the verification task seriously. “Since the RTL is generated form the HLS model, both are in sync by construction,” says Beyer. “The HLS model should also be used to verify the generated RTL. This is essentially the same step you would take on a normal synthesis tool by running equivalence checking. The HLS description of custom extensions can also be used to enable compiler support, instruction set simulator, and other elements of the software toolchain for the custom extensions.”

This relies on the high-level description being fully verified. Otherwise, you face a garbage in- garbage out situation. Unfortunately, the tools for doing that are somewhat nascent today. “If you start from a RISC-V model and you make extensions, you still want to make sure that the baseline architecture is fully compatible with the RISC-V spec,” says Goossens. “So we run RISC-V validation suites to make sure that our implementation is really compliant with the RISC-V spec.”

Compliance suites do not verify the micro-architecture and are a long way from being complete. “There is no magic button when it comes to verification,” says Přikryl. “We output a UVM environment and tools such as a random program generator, which is aware of every single instruction that’s inside of the design, including the custom instructions. As a verification engineer you can take these verification tools and use them, or add more tools or testbenches on top of that. You also can do system exploration using the generated TLM SystemC model and create a virtual prototype.”

That becomes part of a larger ‘in the loop’ verification. “We typically recommend that you embed the processor model that we generate in SystemC, and use virtual prototyping tools,” says Goossens. “This enables you to take into account the interactions with, for example, the memory hierarchy, the memory subsystem and measure performance.”

Custom instructions means that you have to do some verification yourself. “We provide the basic regression suite with C-level test programs that you can compile on the processor architecture, and you can simulate the generated code,” says Goossens. “You also can execute those programs natively on the host computer and do comparisons. That’s at the bit level to see if everything is consistent. But when you add custom instructions, you need to extend this test suite with specific additional test programs for what you add.”

Conclusion
Processor design became the core competence of just a few companies, but getting more performance out of processor cores has become increasingly difficult. To make significant headway, multi-core heterogenous compute solutions are required. RISC-V has re-awakened the desire to create them.

The industry is trying to create effective toolchains to address the needs in an environment that is rapidly evolving. There are multiple pieces to the puzzle and the industry is fragmented today, but the tools that exist give a significant boost over writing the processor in RTL, both in terms of optimization potential and productivity.

Related
Working With RISC-V
What’s available, what’s missing, what’s next.
RISC-V Knowledge Center
Top stories, videos, white papers and blogs on RISC-V.
RISC-V Targets Data Centers
Open-source architecture is gaining some traction in more complex designs as ecosystem matures.
RISC-V Verification Challenges Spread
Continuous design innovation adds to verification complexity, and pushes more companies to actually do it.

Source: https://semiengineering.com/high-level-synthesis-for-risc-v/

Time Stamp: October 28, 2021