The Deep Seeds of DraStic, DS Emulator

drasticlayton

A couple of weeks ago, I felt the urge to know more about DraStic, the fantastic DS emulator we now have on Pandora (if you have not yet heard of it, you can catch up here and there). It’s not everyday that you get to see the birth of a new emulator, especially one as good as this one. I reached out to its author, Exophase, to learn more about his motivations, his way of working and emulators design in general.

If you wonder what kind of person has the skills to program emulators in the first place, do not venture further. The answer is here. As you can probably guess if you have a rough understanding of programming, an emulator creator needs to be capable of low-level system programming to get the most out of the hardware, and Exophase’s background is hardly surprising in that sense. 

I have a BS and MS in Computer Science, I do mainly embedded programming at work (but inevitably get tossed on other things). I have followed emulators for a long time, even before I did much programming, so I always had a natural interest in them. I guess it’s common for people with a strong interest in emulation and programming to find those interests converging. My programming interests tend to align more with low-level embedded sort of stuff, which also fits more with emulation.

There are many systems out there to be emulated – but it’s hard for anyone to break new ground, since most of what could be done has already been tried in the past. For an emulator programmer there’s going to be a limited amount of things you can do that’ll be of a lot of use over existing emulators, so if your goal is to make something widely useful there aren’t a lot of options. You could try to break new ground on a new platform that hasn’t been emulated yet, but that usually involves a lot of reverse engineering and trial and error, and could take an uncertain and very long time to get anywhere. All of the low hanging fruit in emulation has been picked a long time ago.

Before working on emulators, it can be helpful to get involved in their communities. And there are circles of emulator authors that are easily accessible, especially on IRC. Yeah, IRC, you know, the communication protocol that everyone called dead when the Instant Messengers like ICQ and MSN arrived and which is still very much alive and kicking? (By the way come and visit the #openpandora channel on irc.freenode.org if you like to know more about the community!).

Emulator authors talk on IRC a lot. I remember there being a bunch hanging around in various channels long before I started doing emulators. If you’re looking for them a good place to start is the channel for a prominent emulator, especially one that attracts a ton of developers like MAME. I keep in touch with some emulator authors on IRC personally, although not usually in big channels. I tend to end up in small channels with a handful of people and maybe 3-4 emulator authors at most, where we talk about our own projects.

Yet Exophase has found a target of interest: emulators for handheld platforms, since many were too slow to run anything well when ported. This very much follows the work he did on the GBA emulator for PSP, also known as gPSP.

gba
Funny I just noticed through the course of this article that this is probably the first piece of software from Exophase I used, without even knowing it.

Most emulator authors are interested in developing for PC. There’s a long heritage of PC and x86 emulator development that goes all the way back to the 90s. Millions of people use emulators on PC so it makes sense that it would attract the most attention. Of course if doing heavy optimization only benefits 1% of your user base it’s not that interesting.

However I should stress that not all emulators for the PC are what I’d consider especially inefficient. For Nintendo DS, the one big open source one (desmume) isn’t that fast, but No$GBA is faster. There are even ones with dynarecs (NeonDS and more recently DuoS). Mind you, it’s hard to give a full appraisal of these other emulators since I can’t see their source and haven’t been able to test them on a wide variety of hardware. As far as alternatives to DraStic go, it’s moot since they can’t be ran on ARM platforms directly.

For the last several years I’ve been interested in doing emulators for consoles where existing (otherwise very good) emulators are too slow for some handheld gaming platform. That means looking for places where the current emulators are too slow but a highly optimized one might not be. It’s hard to tell for sure but it’s easy to narrow it down to what could be possibilities, and I had a good feeling GBA could be like this for PSP, and DS for Pandora or at least other current gen ARM devices. Starting with GBA helped make the rather complex DS more approachable since a lot of its technology extends off of what GBA had. I think you’ll find a lot of DS emulator programmers worked with GBA emulation beforehand.

Only fairly recently have a lot of people really had access to weaker mobile platforms that were both very mainstream and easy to get arbitrary programs running on. So the uptake for new emulators on mobile devices has been slower than emulators in the past. And of course a decent plus fast DS emulator takes a lot of time (and DraStic is still very incomplete; others may have held off longer before releasing something like this).

And in case you wondered, it is not a requirement for the emulator author to enjoy the format or its games. Exophase himself had no passion for the GBA… but he did like a few games on DS. 

I actually didn’t really play that many DS games, although more than I did with GBA where I didn’t even like the platform being starting gpSP. My favorite games among the ones I’ve actually played are the Castlevania series, I like the DS ones most out of the entire series.

I have always seen emulator programming as a kind of black magic, shrouded in mystery, probably because I am largely unaware of their inner workings. I always find it amazing when everything works as intended and it looks just like you have a completely different system running on screen, at about the same speed as the original.

If you think about it for two seconds, if there were no good emulators available there would be very few games to play on Pandora. Yet, as complex as they are, emulators are rarely the work of large teams. Exophase himself very much works on his own:

I’ve never done work on someone else’s emulators, so my emulators have always been from scratch. The closest exception is the GPU plugin I did for PCSX-reARMed, but from my point of view I never even saw any of PCSX-reARMed’s code, I developed it completely as an independent program and gave it to notaz to integrate. Working in collaboration with other programmers has its own challenges and I tend to have really specific visions for how I want the emulator to be done, that would make it hard to get others involved.

While “working from scratch” is possible, you do need to have a lot of documentation to better understand the target system and how to translate it on your platform. For example, even something as simple as the real-time clock of the DS has specific registers you need to know about in order to emulate it. Here’s a piece of the documentation that you can expect for that part. 

0x04000138 - REG_RTCCNT - Realtime Clock Control Register (R/W)
Cpu | Bit | Name Expl.
7   |  0  | In/Out Data Bit This is the Serial In/Out data bit as input or output. 
7   |  1  | SCK Data Bit This is the Serial CK data bit as input or output. 
7   |  2  | CS Data Bit This is the CS data bit as input or output. 
7   |  4  | SIO Direction 0=Input, 1=Output 
7   |  5  | SCK Direction 0=Input, 1=Output 
7   |  6  | CS Direction 0=Input, 1=Output

In the case of the DS there was ample work on previous emulators so documentation was readily available.

Documentation is absolutely essential. If you don’t have it you have to figure it out yourself or guess and hope you get lucky. Fortunately DS was very well documented already because it had such a big homebrew scene, and talented people like Martin Korth (No$) who did a lot of reverse engineering. I still did and am doing my own test ROMs to reverse engineer some more things, but nothing I’d consider absolutely critical to getting the emulator working “good enough.”

When I write an emulator I start with something relatively simple that runs on PC. Usually start with a CPU interpreter with a debugger, then something for graphics and other small peripherals sitting on memory, with a really simple interface – everything running from command line, hard-wired controls, etc. Then I work on getting basic compatibility up, usually starting with homebrew and test ROMs to help expose major problems. After commercial games start running to a decent extent I work on more secondary features like audio. Then I work on optimizing things.

I usually do some level of optimization on x86 first. Not really so that it’ll be very fast there, especially since the higher end desktop CPUs don’t demand it, but in order to get the framework down for later ports. In DraStic’s case this meant doing a recompiler and rewriting the video code to make it fit a more SIMD (Note: Single instruction, multiple data – It describes computers with multiple processing elements that perform the same operation on multiple data points simultaneously. Most modern CPU designs include SIMD instructions in order to improve the performance of multimedia use.) friendly workflow. Some stuff gets optimized but is never meant to be anything more than plain C code so I do that for all platforms. I actually started the x86 recompiler before I had very high compatibility or audio support, because the interpreter was just so slow that it made it hard to test anything. Especially since I was doing a lot of testing on an Atom netbook. When I was doing gpSP I didn’t have this problem because GBA was a lot weaker vs my PC at the time.

In case you wonder what actual tools Exophase is using to develop DraStic, well you will not be very surprised. It’s all standard stuff, there is no magic there. 

It’s written in C and ASM (mostly ARM ASM, with a bit of x86). Compiled with GCC wherever it goes. Edited using plain old gvim (vim + GUI), built with makefiles in a terminal. Using git for a private repository, where I’m still trying to get myself to suck less at version control.

Development has always been on GNU/Linux, which helps since I use some pretty Linuxy things (Pandora using it is a big plus, and it’s kind of a hindrance for moving to other platforms, even Android). I had it working on Windows at one point and it can probably work there again without that much effort, albeit with some slower code paths.

I cross-compile for the Pandora, but much of the ARM-specific work was done not on my Pandora but on my Chromebook which is also running ARM Linux. I compile natively there.

When you start DraStic and play with it, it is hard to realize how much work went in there, even though you get it for free on Pandora. The truth is, it took Exophase a lot of time to reach that point. I hope you all consider this, next time you ask for “extra features”. 

I worked on it about 6 months 2009-2010 and 8-9 months since mid-2012. Hard to say how many hours, there were a lot of slow weeks there and some really focused weeks. Probably over 1000 hours total. I burn out sometimes but what really happens is I tend to have a hard time starting a new task and approaching it very slowly over a long period of time. Then I gradually gain momentum and put more time into something when the task becomes clearer and more focused. I put the most time in when I’m trying to fix something.

When emulator work comes to mind, I think we all realize that most of the hard work is going to be around getting good compatibility first, and optimizing the execution speed second. Or vice-versa, depending on your approach. But even smaller things can be challenging or time-consuming.

Some other stuff still takes a lot of time. While it isn’t really hard, I don’t like doing all the GUI stuff. I guess having any kind of experience with this helps, but if you’re developing for something like Pandora it pays to not rely on standard window manager toolkits since the form factor doesn’t really fit. Also, not to take platform-specific issues for granted… especially you want to make sure that blitting the screen and updating the audio buffer don’t consume a bunch of time. Handhelds like Pandora can have some good fast interfaces for video, you just have to make sure you’re using them.

I asked Exophase as well what he has learnt through the process of making DraStic.

The biggest thing I’ve learned is to force myself to not get stuck on one problem for too long. This is really hard for me. I tend to obsess over doing something the right way and end up wasting weeks without getting much done. With DraStic I had to draw the line somewhere and try to hit self imposed deadlines, and a lot of the time this meant putting something down and working on something else. Creating a big task list helped with this.

If you do not have a Pandora or wonder if there will be a DraStic version available for other platforms, well the code is somehow portable. Android, iOS ? For all ARM based devices, the possibility is there.

Just getting it working on other platforms shouldn’t be that hard, if you don’t care about performance at all. It’s not the most portable thing in the world, being written in plain C in an age where a lot of devices need Java or ObjC for something or other, and there are emulators that already split their core portable part into a shared object, but it should still be pretty decent since it at least tries to abstract out the I/O stuff.

Getting it running fairly well for platforms that aren’t ARM is going to be a lot of work. Much more work if it’s not x86 and not Linux. Fortunately, ARM or x86 covers most stuff out there today; the one MIPS device that looks like it could be interesting has other major problems.

While it’s not open source, I do have some experienced people helping me port it to other platforms. Android is the #1 priority right now.

As you may be aware, DraStic was the winner in the emulation category in the recent DragonBox competition. Exophase did not expect to be ready on time, there was a lot that needed to be done at the last minute.  

I think the emulator somewhat exceeded my early expectations in terms of how “playable” it might be, so I’m not quite as pessimistic about the first release as I was before. Originally I thought that anything using the 3D engine would be way too slow to be playable, and as I started looking through lists of the most highly recommended DS games I found that very close to all of them used the 3D engine. Adding frameskip at the last minute (literally the day before releasing) really helped. The week before the compo deadline was a really crazy scramble to get in all the necessary features I left off while focusing on the emulator core, so it was definitely very stressful and there was a very real risk that I wouldn’t make it in time. But I wouldn’t say I regretted it (although winning doesn’t really matter that much to me).

DraStic generated a ton of interest and feedback on the forum and in the Pandora community, and while this is overall very positive this can have detrimental aspects as well.

The feedback has been very helpful and I appreciate everyone’s dedication to this. Honestly – and this is speaking purely from a personal standpoint that defies logic – I’ve had kind of mixed feelings to it all because it’s been so overwhelming. It’s easy for me to see bug report after bug report (and feature request) and feel overwhelmed and demoralized, especially if I’m trying to work on other things. At some point I pretty much have to section myself off from it and let it pile up for a while.

I have spent the first few weeks after releasing focusing hard on trying to fix bugs and although it feels like for every one I fix another two get reported, I’m at a point where I want to focus on other things for now. It’s helped a lot having other people who can help organize bug reports and reproduce them for me, in particularly I wouldn’t have been able to fix all I have so far without Neelix’s constant support.

While the current DraStic version works very well or decently for many games, there is still a lot to be done and Exophase’s work is far from being finished. The main focus in the next version is probably going to be around speed. As to “when”… 

I had some milestones before releasing and some vague ones going out, but nothing I’d be willing to rigidly tie to a particular roadmap, much less one I’d want to make public. However, I can say that right now I want to focus on the 3D engine. This is split into three basic tiers:

1) Get a better reference codebase going that more accurately matches how the DS works. 2) Implement it as reasonably optimized C code (vs the current code which sucks) and 3) Implement it as highly optimized NEON assembly code.

1) is something I’ve been working on now, and it’s very slow and difficult work. Ideally I’d have liked to get as much pixel perfect as I can, but realistically speaking I probably have to draw a line at something that’s just reasonably close. 2) should hopefully only take a few days or weeks, while 3) could take months. So 3) may be done in parallel with other things.

Coming back on 1), DS’s 3D renderer is very unusual. It’s a scanline-based renderer that interpolates top to bottom and left to right instead of a normal constant gradient rasterizer. It has a lot of precision issues, even in basic things like scan conversion. Errors are usually subtle, but can extend to polygon edges being off by one which can be very noticeable sometimes, creating cracks in objects. The current renderer implementation [in DraStic] is slow and inaccurate, because it uses a lot of floating point code and naive implementations of slow operations like divisions. Floating point code is very slow on Pandora’s Cortex-A8 unless you use NEON for it, which this isn’t. There’s also a lot of overhead converting from float to integer. It’s using floats because that was easier to implement, since you don’t have to worry about picking the right dynamic range and precision. This code was written a long time ago and I needed a simple implementation just to get things working well enough so that I games work well enough that fixing and optimizing other parts of the code is possible. There are also a lot of fairly simple optimizations like parameterizing out conditionals from inner loops.

This being said, I am not going to optimize the current renderer, I’m going with a new one. This is a very major goal since it’s closely coupled to performance of most games right now.

Another major task, closely related to optimizing the 3D engine, is optimizing the geometry engine. This could benefit from a huge overhaul. Again, this would happen in a better C version first then a NEON version, but I have much less to gain from the C one; it’d probably be more to help organize data and algorithms for later.

Similarly, there are areas where I’d like to dedicate a couple weeks straight to new optimizations for CPU and 2D. There’s a lot left to do for both. In particular, I need to better optimize various memory I/O operations because these are chewing up a lot of time in a lot of games. There’s also a lot of optimization to be done for standard memory operations (both loads and stores, and block memory ops). Sometimes the game is pushing real work here like loading up geometry commands because it’s using I/O for 3D display lists instead of using DMA, for some reason. Other times the game is wasting time doing something dumb like constantly blindly setting and retrieving the results from the division unit.

I’ve seen games where this stuff can take a huge amount of time that’s being recorded as “CPU” time by the benchmarking tool built into the emulator. For instance I’ve seen cases where around 8ms is marked as used by the CPU core but profiling only shows about 2ms spent in recompiled code, and memory I/O functions show up all over the profile. This 6ms wouldn’t be entirely memory mapped I/O because stuff like DMA, context switching, and some per-scanline state machine code takes part of the “CPU” time too but it’s probably a huge chunk of it. So it’s very significant.

But compatibility is not left behind altogether. But it is more of an ON and OFF thing between speed and compatibility.

I’ll be working on compatibility where I can. Possibly in between other major tasks I’ll dedicate a few weeks to compatibility. For 2D, there are still some key functions I haven’t converted to ASM, and some of the ASM ones I need to investigate as they’re taking more time than I expected.

There are some other things that need to be implemented for compatibility’s sake. For instance, wifi needs some level of fake support to make some games run. DLDI support for homebrew is an eventual goal.

Development time is an important factor, as well as motivation. Since DraStic is a performance based emulator, you can guess that there will be a time where either the hardware available or the optimizations will bring the development close to a stopping point: 

I’m sure I’ll stop working on it eventually I doubt I’ll ever consider it done. There’s just way too much that can be done for it, and since it’s a performance-focused emulator eventually it just won’t be as relevant, and it’ll make more sense to work on other things (although I have no idea what). Still, I hope and intend to work on it longer than gpSP (not that it wasn’t already in development far longer than gpSP was). I knew from the start that this emulator would need a very long term commitment.

Both could be factors (good enough for most games, or move on better platforms where performance will be less of an issue), but ultimately I doubt I’ll want to work on it forever regardless of what shape it’s in.

Well, no matter how much progress is made on DraStic, the possibilities offered by the current build are already huge and opened another door of possibilities for the Open Pandora. It will be very interesting to see how far this emulator can go on the current hardware base.

Even these girls are excited about DS emulation. Aren't you excited too?

Even these girls are excited about DS emulation. Aren’t you excited too?

More on this in a couple of weeks/months, hopefully !

Thanks Exophase for the availability to answer our questions.

10 thoughts on “The Deep Seeds of DraStic, DS Emulator

  1. sehs33

    1 question i have to Exophase which might be off topic is whether is it possible to have gpsp ported to the gp32? 🙂

    great interview btw thank you both!

    Reply
    1. Exophase

      It may be technically possible but the small amount of RAM could heavily compromise it. You’d have to make the ROM buffer and translation caches pretty small to fit, and at least some games will thrash it pretty badly. It probably wouldn’t perform that well at 133 or even 166MHz anyway.

      Reply
  2. Pingback: From DraStic to Hunger | PandoraLive

    1. ekianjo Post author

      Potentially, since it runs on ARM too, and was developed on a Chromebook as far as I can remember… but you can’t run it directly with the Pandora version, so you’d need to ask Exophase for a specific Chromebook version (if he ever intends to release it).

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *