Comments and HTML5 scatterplots

To try and learn some Javascript and use of the HTML5 canvas, I decided to make a visualization tool. After some googling I decided on scatterplots that allow linking and brushing. My idea was to allow such plots to easily be added to any webpage or blog. At the moment they only work well in Webkit based browsers (Chrome, Safari) though.

If you are not familiar with visualization lingo, brushing basically means that you can select datapoints in some fashion. Linking means selections are propagated across two linked views. So if I select some datapoints in one plot the same datapoints are selected in a linked plot. Below is an example of the canvas based scatterplot implementation I ended up with. To try the linking and brushing scroll down to here.


canvas

As you might have understood from the axis names, instead of generating a dataset, I wanted to apply this to some real world data. The idea was inspired by “kommentarfeltdugnad”, which aim to not let extremism run unchallengde in public newspaper discussions. I have a hard time finding anything I want to do less then argue in a newspaper comment section, instead i figured studying the people doing this would be more interesting. Using visualization for this was inspired by Correll et al’s excellent paper on text visualization. If possible I would like to implement several more of their approaches later on.

Kommentarfeltdugnad also got me thinking about alternative approaches to discussion and forum moderation, especially wondering if it is possible to profile writers over time based on their writing, and ban or even hellban users which pass a certain threshold of badly argued or impossible to decipher posts.

The dataset I’m using in this post was derived from this article due to the amount of comments. All HTML is stripped away and descriptive parameters derived from the remaining comment data.

The aim of the initial exploration of this dataset is to see if it is possible to connect some derived parameters, and possibly tie these to certain usage of some words. You are of course compelled to explore the datasets yourself using the interactive scatterplots below.

I quickly figured out that a large percentage of comments were to short to make out any meaningful statistic. I decided that comments below 50 words were not to be included. Even though 50 is quite arbitrary and decimal oriented.

From the remaining posts I derived several parameters describing each post:

  • Percent of capitals.
  • Percent of punctuation.
  • Amount of words.
  • Percentage of occurrences of specific words (islam, muslim, sosialism, …).
  • Letters in post.

Initially I thought I would start with something obvious as a test of my data data. I think it reasonable to assume a very strong correlation between the amount of letters and words in a post. This can be seen in this plot.

Less obvious is whether there is a correlation between the percent of capitals and an excessive amount of punctuation in a post. Is it so that excessive capitals typers are more likely to either forgo or misuse punctuation?

Even more hypothetical is whether there is a correlation between mentions of islam or muslims and excessive use of capitals?

Below three scatterplots are linked together, with the first showing percentage of capitals and punctuation, the second showing word count and letters in post, the third shows percentage of capitals and percentage of words that are (islam/muslim). This allows for exploration of some of the questions above, as well as some other questions.


canvas2

canvas3

canvas4

There seems to be a small correlation between the amount of capitals and punctuation in a post. It would be very interesting to compare this dataset to a dataset with properly typed text. I also think that punctuation is very broad, and just focusing on exclamation and question marks might also be interesting.

The percentage of capitals and occurrences of islam seems not really related, though curiously, very long posts with a high amount of capitals do not seem to mention islam for some reason.

I guess that concludes my very superficial analysis of this dataset. Feel free to do whatever with the dataset, and once cleaned up I’ll link a zip with the JS source for convenience. My next posts will detail how to scrape the data, and how to add scatterplots to your own webpage or wordpress blog.







Finding matrix position from vector address

I have a symmetric matrix, which is stored as a vector containing only the needed elements. Is there an easy way to go from vector address to positions in the matrix?

The same operation on a square matrix stored in this way is done by:

(1)   \begin{equation*} x = v \mod{n} \end{equation*}

and

(2)   \begin{equation*} y = \frac{v}{m} \end{equation*}

Where v is the position in the vector, and m and n matrix columns and rows. This also conveniently reduces to bit shifts and masking if m and n are powers of 2. Operations are of course Integer operations.

My question is whether there exists a similar simple operation on a vector containing just one of the parts of a symmetric matrix.

 

Example of problem
Symmetric matrix stored in vector, not necessarily the best way to store it but one of many possibilities.

One opportunity is to use that the amount of elements needed to store a symmetric matrix is given by:

(3)   \begin{equation*} \frac{n(n+1)}{2} \end{equation*}

Using this one can iteratively search over n until the vector address is contained in the current block of inspected data. Then the remaining elements along with the amount of removed elements give the position. The problem is that this is n operations in a linear search, or log(n) operations if binary search is used.

so I’m wondering if there is an easier way to do this. Basically what I want is a very quick algorithm from vector position to matrix position.

Possibly useful observations are that the result from. n(n+1)/2 can always be divided by the odd component part of n and n+1, as well as the even component divided by two.

Also the resulting triangle can be divided into a structured amount s of square areas of values w:

(4)   \begin{equation*} s = 2^i \end{equation*}

(5)   \begin{equation*} w = \lfloor\frac{(n+2^i)}{2^{(i+1)}}^2\rfloor \end{equation*}

Where i is the squares where 0 is the largest square, and larger i gives the next smaller one. n is from the original matrix m and n. When the result is zero no more squares are needed to describe the area.

How this can be turned into a storage and addressing scheme, I do not know, and I wonder if there is no faster way (the binary search) to recover a position (without storing them beforehand) when something is stored in this manner?

Note that I’m not discussing packing and unpacking the entire matrix here (which would be O(n) using the linear search method from above), but finding the position of a single element from its position in the vector.

The never ending feature and bug fixing phase

At the moment I’m facing what might be the hardest part of game development. I have to make the decision: “The game is finished”. That is with the exception of game breaking bugs that might need fixing later.

This decision is hard, since I at the moment have a quite good and fun game ready, but there is a continuous stream of small changes I could do to make functionality better. None which are really needed for gameplay, but that make the game slightly more enjoyable. It is often really hard to evaluate which of these are important for player, or if it is a strive for perfection that no one will ever notice. Weeding out what is important with a small beta tester staff is hard.

On another front we have decided that the game will be harsh and hard. To use a term often used by Sirlin, it is quite slippery slope. I decided along with my main gameplay tester that it actually does not take that much away from the fun. This was also part of the decision to have an easier difficulty available, which really makes it easy to use the powerful tools you are given. What makes me very happy is that the tools really feel powerful, and can be used for nice tricks that are not all that apparent in the beginning. It is probably the part I’m the most happy with along with orb design.

That’ll be all the game design/development ranting for today

Game and application progress

The last weeks I have been very busy trying to finish my game.

The experience so far has not been great. I’m using a Java library called Slick for rendering and basic container for my game. While I was developing my game I have figured out that a lot of the stuff I have made is already present, but often not supporting the exact function I need.

Some of my code is also quite old and naive, and it does have some bad quirks that I have now hopefully fixed.

Anyway I have now added sound effects for many of the mechanics in the game, and it became even more fun. Sound is actually very hard to get perfect, and I have in some areas settled for not perfect since it would take me way to long to find something better.

As far as my other summer programming is coming along, I have got QT and OpenCL cooperating nicely and my prototype program seems to send the stipples to their correct position, what is left of the main tasks is rendering the stipples. Once that is done I will focus on making a GUI for specifying options for the stippling. This includes.

– Scaling up image to have more weights for the stipples.
– Selecting stipple size.
– Specifying threshold for weight value in initial random distribution of stipples.
– Specifying iteration number.

Game update

Since the semester at school ended I have been focusing on finishing my game, today I got rid of at least 5 issues which has been irritating me:

– Figured out what was causing staircase artifacts in scaled down game windows.

– Added some graphics to be shown while loading.

– Realized that a a screen in the game might need a complete makeover, it currently does not fit the rest of the interface.

– Added text which shows when the game is paused

First Post!

It is alive!

My blog has its first post, and all is well.

As a first post I’d like to share the current status of my game as well as some interesting CG topics I would like to look into during summer.

Artifact

My game is called Artifact, and it is an arcade style game. For those who have played Ambrosia Software’s Barrack, my game is very similar, and it was major inspiration for creating this game. The game itself is pretty similar, but introduces some different challenges and a system for better resource management, making the player more in control of the resources he or she has access to.

At this moment the game is fully playable and quite fun. What is left is balancing, and adding sound. The big issue is sound since this is a topic where I have no experience. Hopefully there will be some progress in that domain during summer, so that I can release the game itself this fall or winter.

Summer CG

During this summer I also have some projects that I’d like to finish. During the last semester I implemented a weighted k-means algorithm for creating stipple images. The algorithm was made using GPU through OpenCL for computation. The algorithm worked nicely in Mac OS X, but when transfered to Windows(the assignment used a windows only application) it stopped working properly and instead started crashing when I used above 200 points, though I had tested it on the same graphics card in Mac OS X with 5000 points. Since I had no time to fix it during the semester and I really want to make it work, this summer I will try and create an application to turn any image or animation into a stippled video or animation. I have decided to make it in C++ and QT since I feel I need to be more comfortable with C++.

I also want to test the image space ambient occlusion on some models, and see how well it would function for different types of games.