Superb toy!

This toy is incredible. It is reasonably easy to construct things, while still providing a challenge once you get the basics. It is suitable for nearly all ages. It says 3+ but if you remove the smaller parts it is suitable for much younger children even though they might just enjoy tearing down your constructs. The absurd thing is that when they do, you wont hesitate to make another one, since building these are way too much fun.

WEDGiTS
WEDGiTS

 

Tips for more robust GPU programs using C++ and OpenCL

When I started programming for the GPU, I struggled to keep my GPU programs robust. This was partly because I was very new to C++ and C. OpenCL is very low level and this in combination with my inexperience led to a lot of not so robust prototypes. While my code worked for prototypes, it was very cumbersome to code and not very reusable.

In this post I will share some of the solutions I found to some of the issues I had. These deal with allocating and releasing memory on the GPU, as well as converting relatively cleanly between STL vector and GPU memory representation.

Simplifying release and allocation of memory

Basic memory allocating on the GPU using OpenCl is done by using the code below. Where memory is a reference to the memory on the GPU.

cl_mem memory = clCreateBuffer(context, CL_MEM_READ_WRITE, 
                               numberOfBytes,NULL, NULL);

To release the memory allocated, the below function has to be called with the memory object as argument.

clReleaseMemObject(memory);

This leads to several ways to leak GPU memory. Especially if your code throws exceptions or does something else which “looses” the reference to the memory object and you forgot to release the memory explicitly.

A way to mitigate this in C++ is to create an object which on initialisation allocates memory, and which when its destructor is called release the memory. This ensures that when the object goes out of scope or is deleted the memory on the GPU is also released. This is basically RAII (Resource Acquisition Is Initialization)

A very basic (leaving out code for copy constructors etc) object for GPU memory would look like this.

class GPUMemory {
private:
    cl_mem memory;
public:
    GPUMemory(size_t e,const cl_context &ctx) {
        memory = clCreateBuffer(ctx, CL_MEM_READ_WRITE, e,
                                NULL, NULL);
    }
    ~GPUMemory() {
        //release if out of scoped or deleted
        clReleaseMemObject(memory);
    }

    /*
     * Copy constructor, copy assignment operator etc goes here 
     * implementation should be according to your need 
     */
}

// Possible usage
void doAwesomeCalculation (size_t size,cl_context &context) {
       GPUMemory map(size,context);
       // Do calculation, throw exceptions, etc
} // Out of scope so GPU memory is released

This means that you do not have to explicitly release memory for every weird way your application could behave, which avoids a lot of code repetition and leak opportunities. As long as the memory in allocated in the correct scope you mostly do not have to think more about it. It should be noted that you should follow rule of three and implement copy constructor and assignment operator as well. The implementation may wary depending on how you want your object to behave. I’ll leave that for another blog post.

From vector to GPU memory

Vector is a very useful data structure when you want something sent to the GPU and back, since everything is stored adjacently in memory, no fetching or disbursing on the CPU side is needed. To convert a vector to the kind of memory object I designed above, this is the code i use:

template  GPUMemory createAndFillMemory(const vector data,
                                                    cl_context context,
                                                    cl_command_queue queue) {
   cl_int err;
   if(data.size() > 0) {
        GPUMemory mem = GPUMemory(data.size(),context);
        err = clEnqueueWriteBuffer(queue, mem.getMemory(), CL_TRUE, 0, mem.getBytes(),
                                       (void*)&data[0], 0, NULL, NULL);

        if(err != CL_SUCCESS) {
            throw runtime_error("Loading buffer failed");
        }
        return mem;
    }
    throw runtime_error("no content in vector");
}

As you can see the GPUMemory object has been extended to use templates, since the use I want in this case is a generic container for some struct. This saves a lot of code since you would need to do this conversion manually for each different struct you are planning to put on the GPU if you do not have a similar method. It also ensures that you do not make offset or type errors when converting to and from the different representations.

The GPUMemory class has also been extended to contain information about the amount of objects added, as well as type size.

Conversion back from GPUMemory object to a vector is done using this function:

template  vector readBackMemory(GPUMemory &data,
                                           cl_context context,
                                           cl_command_queue queue) {
   vector vec(data.getElements(),T());
   cl_int err;  
   if(data.getElements() > 0) {
        err = clEnqueueReadBuffer(queue, data.getMemory(), CL_TRUE, 0, data.getBytes(),
                                  &vec[0], 0, NULL, NULL);
        if(err != CL_SUCCESS) {
            checkError(err);
            throw runtime_error("Content not read back from GPU");
        }
        return vec;
    }
    throw runtime_error("No content in vector");
}

General conversion from a vector containing any object to GPU memory is not something I would recommend. While this is possible using the code above, conversion using these methods should be restricted to primitives and structs of fixed size. You should also avoid pointers, as it makes little sense to send a pointer pointing to objects in CPU memory to the GPU.

There is still room for mistakes as you will see by the following example. When you use memory on the GPU, no metadata carry over from the C++ code. So on the GPU side you are responsible for using the memory you transferred correctly, the methods above only ensure that the copying of memory back and forth from the GPU results in correctly typed and sized arrays.

An example

A simple example can be to copy some particles to the GPU, apply some calculation and then copy the particles back. On the CPU side I then define a particle struct:

struct GPUParticle{
    cl_float3 pos;
    cl_int identifier;
    cl_float3 direction;
};

On the GPU (OpenCL kernel) side, a simple kernel to be applied could look like this:

struct GPUParticle{
    float3 pos;
    int identifier;
    float3 direction;
};

__kernel void copyParticles(
                       global struct GPUParticle * data,
                       global struct GPUParticle * dataOut
                       )
{
    int x = get_global_id(0);
    //do calculation here
    dataOut[x] = data[x];
}

It is important to keep the structs on both sides similar in structure and byte size, so that there are no offset issues (you should run tests for this). This means that data is not type safe when sent and retrieved from the GPU, but I still think that partial type safety is much better then none at all, since this leave much fewer avenues for such errors to occur.

Finally to apply the kernel to the data I create this method:

vector Stuff::testCopy(vector &data) {
    GPUMemory mem = utils.createAndFillMemory(data, context, cmd_queue[0]);

    GPUMemory out = GPUMemory(mem.getElements(),context);

    size_t work_size[1];
    work_size[0] = mem.getElements();

    err  = clSetKernelArg(copyParticles,  0,
                          sizeof(cl_mem), mem.getMemoryRef());
    err |= clSetKernelArg(copyParticles,  1,
                          sizeof(cl_mem), out.getMemoryRef());

    if(err != CL_SUCCESS) {
        utils.checkError(err);
        runtime_error("Kernel setup failed");
    }

    // Apply kernel to data
    err = clEnqueueNDRangeKernel(cmd_queue[0], copyParticles,
                                 1, NULL, work_size, NULL, 0,
                                 NULL, NULL);
    if(err != CL_SUCCESS) {
        utils.checkError(err);
        runtime_error("Calculation failed");
    }
    return utils.readBackMemory(out,context, cmd_queue[0]);
}

There are of course problems where these techniques are not suitable or possible, but I use them a lot and they have made my GPU code much more robust. If you have suggestions for improvement or your own small snippets, please add a comment.

Some of my inspiration for this:
Bjarne Stroustrup Going native 2012

The “The question is dumb” checkbox is missing

 

√ The question is dumb
What a poll should look like

Normally I do not answer telephone based marketing or marketing research, I try to answer polls though, since I think that spreading my opinion even if it ends up as one hundredth of one percent in a final statistic is slightly useful. What is critical for my participation in such polls though, is that I feel that my opinion is actually represented. Lately I have participated in several polls where the poll is fundamentally flawed and would lead to misleading statistics.

Last night I was asked if I wanted to participate in a poll on politics. Since there was no marketing questions mixed in I said ok. The initial questions were quite normal ones, like “which party did you vote for …”, “which party would you vote for …” and so on.

After a while the questions deteriorated, I do not remember all of them, but one that had me quire frustrated was this:

“Do you think the norwegian government should hire private companies for public tasks?”

To me this question is a a case by case question, there is no single right answer for the different activities a government might undertake. Now the poll does not allow this answer though, it only allows “I do not know” or strong or weak versions of “agree” or “disagree”.

In another poll on an earlier occasion, I was asked:


“Do you think syntethic estrogen is more damaging for aquatic life compared to natural estrogen?”

This question was extremely confusing to me. First of all I do not know if there is a difference between syntethic and natural estrogen (or what is meant by syntethic or natural in the question), secondly the only honest and respectable answer I could give is: “Whatever is the current scientific consensus”. The poll instead has me rating how much I agree or disagree.

I really wonder what the statistics derived from these questions were used for. Which dimwit made a decision based on the “knowlege” gained from it.

I could rant about this all day, but In my opinion rants alone are not good, it is better to propose a solution. What I think would make these polls much better is if there always was a “The question is dumb” checkbox, which had to be included by law, and which also had to be included in the final statistic by law. This way no ones voice is completely misrepresented as mine probably was in these cases.

My experience scraping Disqus

To acquire data for my project mentioned here, I wanted to scrape the comments section of a norwegian newspaper.

Disqus comments example
Comments at a norwegian newspaper based on Disqus

Initially I thought it would be as simple as getting the source and applying whatever regular expressions that would give me what I wanted. Sadly it was not that simple.

My chosen norwegian newspaper use Disqus as comment system. Disqus uses javascript quite heavily to generate HTML, this means that the comments are not part of the original source but just exist in the browser DOM. This problem can be solved by using “view generated source” in firefox developer plugin, or by making a plugin for a browser and manipulate the DOM directly. For my problem viewing the generated source was the simpler answer. If I wanted to scrape Disqus often I would make a plugin to dump the generated source.

Firefox generated source
Showing generated source using firefox developer plugin.

Sadly this was not the only problem with scraping Disqus. Disqus is configured to only shows a small amount of comments at a time, and to show more a javascript routine must be invoked several times. I just did it a lot of times manually, though again if I really wanted to scrape Disqus based systems a lot, I would automate this step.

Once the I got the data I load it to a java routine which remove unwanted data based on regular expressions (it is faster to access the DOM, I would do that if I had lots of data). Then I generate descriptive parameters (word count, punctuation amount , etc) based on post content. The resulting parameters are stored as JSON. Then the resulting JSON is used as basis for the scatterplots here.

Adding javascript functionality to WordPress posts

Adding your own javascript implementation to posts in your wordpress blog is slightly more difficult then one might think. The difficulties stem from the way WordPress editors format html and javascript in posts. If you just type refer to .js files and then add it by adding some inline script tags, WP will mangle the inline js.

The easiest way to do it is to add the files normally:

<script scr="/path-to-script/cool.js"></script>

Then refer to it using inline script like this:

<script><!--
//Use script here
--></script>

This can of course also be done in several other ways, using WP functionality, but those are all not very satisfactory in my opinion, and they do not fit JS that will only be used in a single post.

Adding interactive scatterplots to your website

To use the scatterplots presented in my previous post, this is what you need.

  • My scatterplot javascript code (LGPL) available for download here.
  • A dataset stored in JSON, with the structure explained below
  • Add .js files to HTML header
  • HTML canvas elements in your html
  • Assignment of data to axes and plots to canvas in javascript

Dataset structure

Datasets for use in my scatterplot code is structured as a JSON array named “dataArray” containing n maps. Each map contains n “name”:value pairs which each is considered a single datapoint. The values to be used in a scatterplot of course need to be numerical. An example of a valid dataset structure is shown below. This is all put in a .js file, and returned by a function named datasetForVisualization() as shown below.

//of course more data in a real dataset
datasetForVisualization() {
   return {"dataArray":
      [{"punctuationprword":2.0,"ekstrem":0.0"},
      {"punctuationprword":6.0,"ekstrem":0.0"},
      {"punctuationprword":0.1111111111111111,"ekstrem":0.0}]
}

Add .js files

To add the .js files to your page, add these tags inside your website:

<script src="data.js"></script>
<script src="geometry.js"></script>
<script src="guielements.js"></script>
<script src="selection.js"></script>
<script src="plots.js"></script>
<script src="dataset.js"></script>

HTML canvas elements

HTML5 canvas elements are created in the html parts of you website, and are created like this:

<canvas id="plot" width="640" height="480"><canvas>

The id parameter is needed since this is what you need to refer to when drawing to the canvas using javascript.

Assignment to canvas

The following javascript code will assign datasets parts to axes in the scatterplots, as well as tie the plot to a specific canvas and link with another scatterplot

<script type="text/javascript">
   //Select data for plot
   var data = create2DDataFromJson("punctuationprword","ekstrem");

   //Reference to canvas element created above
   var canvas = document.getElementById("plot");

   //Create plot
   var plot = new Scatterplot(canvas,data,540,400,
   "Punctuation pr word","Instances of ekstrem",false);

   //Link plot to other scatterplots
   plot.addLinked(plot2);
   plot2.addLinked(plot);
</script>

Hopefully this explains how to use the scatterplot as part of a webpage. To successfully use it within a WordPress blog is slightly more difficult. One of my next posts will go more into detail about that specifically. If you have any questions about usage please post a comment.

Comments and HTML5 scatterplots

To try and learn some Javascript and use of the HTML5 canvas, I decided to make a visualization tool. After some googling I decided on scatterplots that allow linking and brushing. My idea was to allow such plots to easily be added to any webpage or blog. At the moment they only work well in Webkit based browsers (Chrome, Safari) though.

If you are not familiar with visualization lingo, brushing basically means that you can select datapoints in some fashion. Linking means selections are propagated across two linked views. So if I select some datapoints in one plot the same datapoints are selected in a linked plot. Below is an example of the canvas based scatterplot implementation I ended up with. To try the linking and brushing scroll down to here.


canvas

As you might have understood from the axis names, instead of generating a dataset, I wanted to apply this to some real world data. The idea was inspired by “kommentarfeltdugnad”, which aim to not let extremism run unchallengde in public newspaper discussions. I have a hard time finding anything I want to do less then argue in a newspaper comment section, instead i figured studying the people doing this would be more interesting. Using visualization for this was inspired by Correll et al’s excellent paper on text visualization. If possible I would like to implement several more of their approaches later on.

Kommentarfeltdugnad also got me thinking about alternative approaches to discussion and forum moderation, especially wondering if it is possible to profile writers over time based on their writing, and ban or even hellban users which pass a certain threshold of badly argued or impossible to decipher posts.

The dataset I’m using in this post was derived from this article due to the amount of comments. All HTML is stripped away and descriptive parameters derived from the remaining comment data.

The aim of the initial exploration of this dataset is to see if it is possible to connect some derived parameters, and possibly tie these to certain usage of some words. You are of course compelled to explore the datasets yourself using the interactive scatterplots below.

I quickly figured out that a large percentage of comments were to short to make out any meaningful statistic. I decided that comments below 50 words were not to be included. Even though 50 is quite arbitrary and decimal oriented.

From the remaining posts I derived several parameters describing each post:

  • Percent of capitals.
  • Percent of punctuation.
  • Amount of words.
  • Percentage of occurrences of specific words (islam, muslim, sosialism, …).
  • Letters in post.

Initially I thought I would start with something obvious as a test of my data data. I think it reasonable to assume a very strong correlation between the amount of letters and words in a post. This can be seen in this plot.

Less obvious is whether there is a correlation between the percent of capitals and an excessive amount of punctuation in a post. Is it so that excessive capitals typers are more likely to either forgo or misuse punctuation?

Even more hypothetical is whether there is a correlation between mentions of islam or muslims and excessive use of capitals?

Below three scatterplots are linked together, with the first showing percentage of capitals and punctuation, the second showing word count and letters in post, the third shows percentage of capitals and percentage of words that are (islam/muslim). This allows for exploration of some of the questions above, as well as some other questions.


canvas2

canvas3

canvas4

There seems to be a small correlation between the amount of capitals and punctuation in a post. It would be very interesting to compare this dataset to a dataset with properly typed text. I also think that punctuation is very broad, and just focusing on exclamation and question marks might also be interesting.

The percentage of capitals and occurrences of islam seems not really related, though curiously, very long posts with a high amount of capitals do not seem to mention islam for some reason.

I guess that concludes my very superficial analysis of this dataset. Feel free to do whatever with the dataset, and once cleaned up I’ll link a zip with the JS source for convenience. My next posts will detail how to scrape the data, and how to add scatterplots to your own webpage or wordpress blog.







Finding matrix position from vector address

I have a symmetric matrix, which is stored as a vector containing only the needed elements. Is there an easy way to go from vector address to positions in the matrix?

The same operation on a square matrix stored in this way is done by:

(1)   \begin{equation*} x = v \mod{n} \end{equation*}

and

(2)   \begin{equation*} y = \frac{v}{m} \end{equation*}

Where v is the position in the vector, and m and n matrix columns and rows. This also conveniently reduces to bit shifts and masking if m and n are powers of 2. Operations are of course Integer operations.

My question is whether there exists a similar simple operation on a vector containing just one of the parts of a symmetric matrix.

 

Example of problem
Symmetric matrix stored in vector, not necessarily the best way to store it but one of many possibilities.

One opportunity is to use that the amount of elements needed to store a symmetric matrix is given by:

(3)   \begin{equation*} \frac{n(n+1)}{2} \end{equation*}

Using this one can iteratively search over n until the vector address is contained in the current block of inspected data. Then the remaining elements along with the amount of removed elements give the position. The problem is that this is n operations in a linear search, or log(n) operations if binary search is used.

so I’m wondering if there is an easier way to do this. Basically what I want is a very quick algorithm from vector position to matrix position.

Possibly useful observations are that the result from. n(n+1)/2 can always be divided by the odd component part of n and n+1, as well as the even component divided by two.

Also the resulting triangle can be divided into a structured amount s of square areas of values w:

(4)   \begin{equation*} s = 2^i \end{equation*}

(5)   \begin{equation*} w = \lfloor\frac{(n+2^i)}{2^{(i+1)}}^2\rfloor \end{equation*}

Where i is the squares where 0 is the largest square, and larger i gives the next smaller one. n is from the original matrix m and n. When the result is zero no more squares are needed to describe the area.

How this can be turned into a storage and addressing scheme, I do not know, and I wonder if there is no faster way (the binary search) to recover a position (without storing them beforehand) when something is stored in this manner?

Note that I’m not discussing packing and unpacking the entire matrix here (which would be O(n) using the linear search method from above), but finding the position of a single element from its position in the vector.

The never ending feature and bug fixing phase

At the moment I’m facing what might be the hardest part of game development. I have to make the decision: “The game is finished”. That is with the exception of game breaking bugs that might need fixing later.

This decision is hard, since I at the moment have a quite good and fun game ready, but there is a continuous stream of small changes I could do to make functionality better. None which are really needed for gameplay, but that make the game slightly more enjoyable. It is often really hard to evaluate which of these are important for player, or if it is a strive for perfection that no one will ever notice. Weeding out what is important with a small beta tester staff is hard.

On another front we have decided that the game will be harsh and hard. To use a term often used by Sirlin, it is quite slippery slope. I decided along with my main gameplay tester that it actually does not take that much away from the fun. This was also part of the decision to have an easier difficulty available, which really makes it easy to use the powerful tools you are given. What makes me very happy is that the tools really feel powerful, and can be used for nice tricks that are not all that apparent in the beginning. It is probably the part I’m the most happy with along with orb design.

That’ll be all the game design/development ranting for today

Game and application progress

The last weeks I have been very busy trying to finish my game.

The experience so far has not been great. I’m using a Java library called Slick for rendering and basic container for my game. While I was developing my game I have figured out that a lot of the stuff I have made is already present, but often not supporting the exact function I need.

Some of my code is also quite old and naive, and it does have some bad quirks that I have now hopefully fixed.

Anyway I have now added sound effects for many of the mechanics in the game, and it became even more fun. Sound is actually very hard to get perfect, and I have in some areas settled for not perfect since it would take me way to long to find something better.

As far as my other summer programming is coming along, I have got QT and OpenCL cooperating nicely and my prototype program seems to send the stipples to their correct position, what is left of the main tasks is rendering the stipples. Once that is done I will focus on making a GUI for specifying options for the stippling. This includes.

– Scaling up image to have more weights for the stipples.
– Selecting stipple size.
– Specifying threshold for weight value in initial random distribution of stipples.
– Specifying iteration number.