Parallax Propeller 2 Development: An Interview with Chip Gracey and Ken Gracey
Released almost 16 years ago, Parallax’s Propeller 1 offered eight processor cores in a small DIP-40 or QFP package with up to 80-MHz clock frequency on all cores. In November 2020, the Rocklin, California-based company officially released the Propeller 2, an eight-core microcontroller running up to 300 MHz with more RAM, additional smart pins, and new peripherals. Check out Mathias Claussens’s five-part article series, “Hands On the Parallax Propeller 2,” for more details. The Propeller 2 is unique in many ways, so we took advantage of the opportunity to speak with Parallax founder Chip Gracey and his brother CEO Ken Gracey about the chip and its development.
Lessions Learned
Elektor: The Propeller 2 is out now. What lessons did you learn from the Propeller 1, and how did you apply them to the Propeller 2?
Chip: For the Propeller 1, we had eight processor cores, called cogs. The limiter was that if we had a lot of very high bandwidth tasks that required intensive CPU interaction, sometimes all cogs could get used up pretty quickly. On the Propeller 2, we put a whole bunch of smarts to the I/O pins, so that every single I/O pin has three 32-bit registers, which allow us to configure it for all kinds of different measurement modes and communication modes. These pins handle the super high bandwidth interaction and the processor is unloaded. We added a lot of analog functionality to the I/O pin so that every pin has three different DACs in it, including two DACs which are relatively low impedance, that can output analog signals very quickly. We also put analog-to-digital converters into every pin; they’re like delta sigma converters. But in the smart pins, for example, we also do summation filtering.
Elektor: Was designing the Propeller 2 like the Propeller 1 — a CPU design from scratch in terms of architecture?
Chip: I think, on the original Propeller 1, we had maybe 63 instructions. We have a lot more instructions on the Propeller 2. Around 350. They do various specific things, but they’re also kind of like building blocks.
Elektor: Designing a new CPU from scratch: how long did it take from the first lines on paper to the final chip?
Chip: To the time the Propeller 2 was done, it took 14 years, which in the marketplace is completely unreasonable. The world goes through a few seismic changes in technology in that span of time. But what we did for this chip, we just really focused on a lot of first principles issues, which are never going to go out of time.
Ken: I want to point to over that 14-year period, Chip learned an awful lot, and we had to restart with new designs as he learned, making several foundry runs of pieces of the die, or the whole die. It was a learning process that required some serious patience.
Chip: We started off thinking in terms of schematics. And then that’s really inefficient when you’re dealing with things like 32-bit buses. You don’t want to be writing schematics for all these wires. That’s how the Propeller 1 was made. The Propeller 2 is abstracted with and written in Verilog.
Elektor: Let’ s talk about building peripherals. What were the challenging ones to construct for the chip?
Chip: You mentioned smart pins. The most challenging thing about the smart pins was the USB mode, because USB is a very arbitrary protocol. They could have made the USB protocol so that it was implementable in software on fast CPUs, but they designed it such that there are these very tight timing corners where CRC checks need to be done. This means you must do a silicon implementation to realize USB. USB is a complicated system that requires a lot of expert knowledge to operate it, and it’s difficult to validate.
Elektor: To get the design into real silicon, how did you choose your manufacturer?
Chip: We kind of bounced around it. For a while, we were just going to do the pure foundry model where we would provide the complete mask set for our design, and then some foundry would fabricate the chip. We did test chips that way, but not full chips. The Propeller 2, I think, has about 630,000 instances of standard cells inside it that make up the logic that controls everything. We don’t have enough time in our lives to figure out how to optimally design all that from our Verilog code. But we also don’t have the ability to optimize sufficiently to get the timing done. The used 180-nanometer process has been out since, I think, 1999. With such good tools, and synthesis tools, and standard cell libraries, it’s amazing that our chip can be clocked up to 350 MHz. We were thinking 180 MHz would be the practical limit. At high temperature, low supply voltage, worst processing conditions 180 MHz is the guaranteed lower limit. But it’s amazing how well the tools can optimize these days, so you can go back to older silicon technologies and get much more performance out of them.
Why did we go to Onsemi? We are somewhat of a beggar, not a chooser, because we are a small company and have limited resources. I’m not sure how we stumbled into Onsemi, but it worked out great, because they have the digital design methodology all worked out. There’s a lot of subtle things that can go wrong when you design a chip. But these days, the tools have all these kinds of warnings. They can warn you that something’s not right so you can be on alert. There’s a big checklist of things that have to be resolved in order to say the silicon compiled okay and it should work.
Ken: Going back to the very beginning with Propeller 1, Chip laid out the Propeller 1 manually, he did a schematic design. And then we started the Propeller 2 and was now writing Verilog code. We found an outside company to synthesize our Verilog code, targeted for a certain foundry process but without a specific foundry in mind. That wasn’t going so well. Finally, we just went straight to the foundry who would work with us.
Propeller 2 Community Feedback
Elektor: How did you come up with the idea to give the community access to the beta bitstreams for the Propeller 2?
Chip: For maybe $100, or a couple $100, you can buy a very nice FPGA board with a pretty substantial FPGA on it. During the design process, I would periodically — like sometimes couple times a week even — output the latest FPGA images for like six different platforms. I put those files out there so that customers could try them out. And a whole bunch of feedback was always coming in from everybody who had ideas for new instructions that would be useful, smart pin modes that would be useful — just all kinds of things that were way beyond what I was going to imagine myself. And that was nice.
And the reason we could use all these different platforms is that the architecture is scalable. For the smallest implementation, we would have one cog, one processor; for the biggest FPGA board, we could have 16. I would compile to whatever the capacity of each FPGA board was. Some would have just a few smart pins, some would have all smart pins. But when we got down to doing the actual physical design of the chip, it was apparent that all that logic to support 16 cogs was going to give us only enough room for 128 kB of main memory. And everybody agreed, among customers and me, eight cogs and 512 KB of main RAM would be a much better balance.
Elektor: How was the feedback incorporated to the new design?
Chip: I mean, there were some really complex bugs that people would find that I am not sure if I was going to detect them in time. Things to do with the FIFO for executing from hub memory. There were some real sleepers in there. And we had one customer that found a lot of things that really shook the bugs out of that. Also, people would have bug reports or something that’s not working. But then also a lot of input came in like: Can we have something that does this certain function? I would kind of aggregate all those ideas, and then implement them in whatever way seemed most efficient. And we had a lot of people on board that had good taste in how things could be arranged to make it easy to live with, so the programmer has an easy time.
Elektor: How large was that community of people looking into the design?
Chip: I think we had maybe at least a dozen people — maybe more. It’s hard to say. I think half of them were strong contributors. And the other half were also, but they were a little more casual. But with every one of those people providing their input, it really enriched the design.
Ken: There were times when Chip really was having a lot of difficulty. And the community truly saved him and encouraged him to keep going the right way. The same contributors followed through and have contributed documentation, support and code to the community.
Chip: I mean, it took so many years, at one point my dad asked, because he’d always been involved with our business, “you don’t have to finish this thing, you can stop it at that point.” It had been going on for like 10 or 11 years. And I had almost been kind of thinking to stop, but I couldn’t, because there was too much involved, and I actually enjoyed working on it anyway. It had taken a lot of resources over time. I mean we spent maybe $6 million on this over all those years.
Coding
Elektor: Using SPIN2 and Assembler may be nice, but many of us are used to also coding in C/C++. Will you help to get C/C++ support for the Propeller 2?
Chip: It’s not an official Parallax project, but we do we have a person. His name is Eric Smith, and he has written quite a good compiler that’s in some areas way ahead of my own SPIN compiler. It is GCC-based, which actually does SPIN2, SPIN, C, and even BASIC. His tool is called FlexProp and it runs on Windows, Mac and Linux. The silicon has pretty good facilities for single stepping and breaking and looking at things during runtime. But I need to finish that on my end, and then he’ll be able to implement it eventually into his tool, so he could single step from source code. He’s very responsive about fixing anything that anybody notices is wrong. I’d say his tool is pretty good, but it’s not an official Parallax tool.
Elektor: From the unofficial set of software tools, there are now plugins for the Propeller 2 added for Visual Studio Code. Will Parallax support Visual Studio Code officially as a development environment for the Propeller 2?
Ken: We already are supporting that with one of our key contributors named Stephen Moraco. And that’s pretty much where I put a lot of my energy because that works with Mac, Windows and Linux with Eric’s compiler.
The Chip Market
Elektor: Let’s turn to the chip market. People are having hard times getting any kind of silicon. How is it going for Parallax in terms of Onsemi and their production capabilities? Do you still get the silicon you have ordered?
Ken: We’re getting everything we’ve ordered, and on a schedule with regular lead times, but more expensive, though.
Chip: Costs are going up all the time these days, so we having to recalibrate.
Ken: I think we really haven’t truly arrived at the right price for the Propeller 2 yet. One customer just placed an order for 250 units and paid around $12 each. But I would really like to be able to sell them for around $10 at high volumes. That is really the goal and it seems we’re meeting that target price now.
Elektor: Have you ever considered switching to a more advanced processing node at the foundry than 180 nm for the Propeller 2? So you get more chips out of one wafer and to be cheaper in the end?
Chip: The trouble is that the costs multiply. In 180 nanometers, it was like $600,000 in the end, and they estimated $270,000 in the first place. I think we had someone say we could do 65 nanometers, it might have even been something bigger, but it was estimated to be like $600,000. I remember the chip we wanted to make that would consume 8 W at a 180 nm node. With a 40 nm process, it would have needed 9 mW for the same number of cores. And it probably would have been able to clock up to 1.2 GHz or 1.4 GHz. So it would have been way smaller, faster and lower power. We could do that; we would just need probably a million dollars to go to like 40 nm.
What Comes Next?
Elektor: As the Propeller 2 is now finished, have you started thinking about a Propeller 3?
Chip: We’ve had some thoughts about Propeller 3. But what we need to do is realize enough revenue from Propeller 2 to finance Propeller 3. Hopefully, it will. The Propeller 2 does a lot of novel things that people would like, but it’s outside of the tent, where a lot of things are today, so it’s a challenge to attract customers.
Elektor: Your cores use a proprietary architecture. Will RISC-V be an option for the next one?
Chip: We have had some customers that were helping with the FPGA design efforts for the Propeller 2 who were into RISC-V and they like it a lot. I think the main draw is that RISC-V has an ecosystem associated with it, which would make the communication, of say, a Propeller 3 chip with RISC-V a lot more easier to the customer than having to explain new things to them. I believe RISC-V is a pretty efficient architecture. We might be, with RISC-V, pushed into the ARM paradigm where you suddenly have to have dedicated silicon to do anything in real time. Because now the compilers that prepare the code to run on RISC-V are probably not going to be thinking about cycle times, a RISC-V implementation might have a few levels of caching, making timing less predictable. To me, it’s not that interesting, as we would have to make up for it, so our contribution would have to be in terms of the smart pins.
Authors’ Note: The Propeller 2 is unique in many ways. If you would like to know more about the chip and get a first impression, take a look at Mathias Claussen’s series, “Getting to Know the Parallax Propeller 2.” As the Propeller 2’s documentation and tooling improves, we might revisit it in another article series.
Questions About the Propeller 2?
Do you have technical questions or comments about this article? Email the authors at mathias.claussen@elektor.com and luc.lemmens@elektor.com.