Mysql GROUP BY context

November 23, 2016

Encountered this MYSQL challenge at work today.

Given a table called test_table like:

+—-+——-+
| id | value |
+—-+——-+
|  1 | 1     |
|  2 | 1     |
|  3 | 2     |
|  4 | 2     |
|  5 | 1     |
|  6 | 1     |
|  7 | 2     |
|  8 | 3     |
|  9 | 3     |
|  10 | 3     |
+—-+——-+

We wanted a result like:

+——-+——–+
| value | amount |
+——-+——–+
| 1     |      2 |
| 2     |      2 |
| 1     |      2 |
| 2     |      1 |
| 3     |      3 |
+——-+——–+

So basically, we wanted to keep the order, but GROUP BY when there appeared to be values that could be grouped together. So if the same value occured more than once they should be group together as one group without grouping every occurance of the value.

The usecase of this is typically a feed of events, like this:

  • The event with of type 1 occured 2 times.
  • The event with of type 2 occured 2 times.
  • The event with of type 1 occured 2 times.
  • The event with of type 2 occured 1 time.
  • The event with of type 3 occured 3 times.

Instead of:

  • The event with of type 1 occured 1 time.
  • The event with of type 1 occured 1 time.
  • The event with of type 2 occured 1 time.
  • The event with of type 2 occured 1 time.
  • The event with of type 1 occured 1 time.
  • etc.

Took some work, but ended on this query:

SET @next := 0, @last := 0;

SELECT test_table.value,
COUNT(test_table.id) AS amount
FROM test_table
LEFT JOIN (
SELECT test_table.*,
IF(@last != value, @next := @next+100, 0) AS xx,
IF(@last != value, @last := value, 0) AS xy,
IF(value, value+@next, 0) as group_value
FROM test_table
) AS test_table_b ON test_table_b.id = test_table.id
GROUP BY test_table_b.group_value

Now, this is not all fool proof. The value that is added to @next may be a constant that has to change depending on the value you want to group by. The last issue to consider is that working with variables like this is not an exact sience, but it work in my example.

Hopefully this was helpfull to some. It was to me.

 


Reducing database queries

September 12, 2010

I found a real badass design flaw on my newly created ORM system the other day. On the register form for a movie, the page had 119 queries taking about 0.085 seconds of running time. Since the site is not finished and the movie I was editing was a simple test, this would have been maybe 250+ queries for one movie, and the data would have been much more on in the tables so the time of each query probably would have been much higher in a real live situation. Have this in mind. What I had done, was to fetch the relations in the many-to-many table with one query each(so that I could use the same get method for every object, including relations). Really dumb, but I did not think it would affect that much in the beginning as long as each of them were fast.

Well, it took time, and I rewrote the system to get every relation in one query per table – reducing the amount of queries to 51 and the time went down to about 0.040 seconds for the page load. That made me think, 51 still sounded a lot to me, so I decided to analyze more. I noticed then that I made the same query many times to some of the tables. For instance, the country table, where I fetched all the objects every time I needed the list. And that list I needed in many places, more precise, in many of the relations. The solution to this began as a crazy simple idea that worked.

What did I do?

In the object that fetched the countries ( SELECT * FROM country ) and made the country object list of it (as mentioned, unfinished and short list), I added a private static $table = array(); at the top. Then I simply used logic from the singleton pattern. I checked if the self::$table['country'] was set with the list, if yes, I used the old list – if no, I created the list.

The amount queries was reduced to 33, and time to 0.035 seconds.

I still need to see if this is a good solution in the long run, as the time to execute the page varied a bit more than before. Can’t wait to see how it performs when the tables gets filled with data.


Database connection speed in php

September 2, 2010

So, on this current project of mine, I have one unfinished page that have currently 31 fast mysql queries(to a database that contain a very little data) – and I wanted to see how different settings would affect the speed. I have heard that connections are costly, but how costly are they really? These are the average results of my little test:

One connection for each query: 0,045 sec
One persistent connection for each query: 0,025 sec
One connection used in a Singleton pattern: 0,0175 sec
One persistent connection used in a Singleton pattern: 0,0165 sec

Summary:
The old method is the slowest by far! If you have many concurrent users on the webpage persistent connection will help a bit. But the best effect is from using a Singleton connection – or another method where you use one instance of the connection on all the queries. Persistent connection uses some time fetching the connection from mysql on the server, so it’s faster to do it in code.


Storing Images with PHP & Mysql – The right way

April 22, 2010

I’ve in that stage of my latest project that I have to decide how to store images to the server, and how to use the database the best way to do this. I know that I’ve done this wrong a couple of times, but hopefully I’ve find a good practice of this now. Not quite sure if this is the best way, but this is how I think after looking into the matter:

There are several things I consider to be important:
1. SEO.
The filenames need to be descriptive and searchable.
2. Database
Images should not be stored in the database, but the filename and id should. You need to know what image goes with what movie. It is possible to store images into database, but it makes the database big and I can image working with the images becomes little more complicated. There are advantages to this too, but it is not the way I’m going to do it.
3. Thumbnails
It’s a good practice not to resize in the <img> tag, so a good system for thumbnails must be at hand.
4. URI
Where to put the files is also something you have to consider carefully. Is it best to put everything in one folder? Probably not. What about “movie-id/images/image.jpg” or “images/movie-id/image.jpg”? Much better.

Let’s take an example database with four tables: Movie, ImageType, MovieImage and Image

Movie:
Important part of movie is that it must have an id and a title.
ImageType:
Must have an id and a type, ex: cover, poster and screenshot
MovieImage:
Realtionship between the movie, image and type. Contains the foreign key to image, movie and imagetype, where one of them has to be unique.
Image:
The image should have an id to the image so that it can be connected to a movie and made unique. In addition it should contain a filename, that is generated in php from the id, title of movie and type of image.

Cooking it all together:
The filename must be SEO optimized, so the syntax to use should be like: id-type-movie-name.jpg. You could remove the type from the name, but it’s a good way to distinguish between the different image types, making the SEO better too. Movie name should be separated by a hyphen to be most SEO friendly. By storing filename once with id instead of generating it every time and out from id, type and title is good practice because if the title is changed (from “Lord of the rings” to the more accurate “The Lord of the Rings: Fellowship of the Ring”) the filename points to the right file – id-type-lord-of-the-rings.jpg. Generating thumbs could then follow a standard of adding a -thumb-width-height or something to the name – or simply create it on the fly with a php method, even thou I think the first option is more correct. When getting all images to a movie you can easily sort by type, and get the filename from database – and then show it on your webpage by creating the correct url, ex: images/movie-id/1923-cover-lord-of-the-rings.jpg.

By thinking like this, the image uploading and storing become much more simplified for me than how I’ve done it before.


Persistent MySQL connection

February 23, 2010

It was not that long ago I figured out what this ment, even thou persistent is kind of self-explanatory. I read that this was a must on a sql connection, but I never learnt it at school or did find out how to do it. For me it was a fancy word for a more complex connection type, but is it really more complex?

What I usually do, is that I use the singleton pattern on my database connection so that on each load the connection is created only once and then reused for every new use. A database connection is expensive, so it is a good thing not to create it a lot, and this is what singleton does. A persistent connection is like a singleton to the server – and does the thing that I hoped singleton would do when I first heard about it.

A persistent connection does not close, but it reused by the different connections to the server. This means that the same connection is reused by every user that connects to the server, and there is not created a new one for each time. Sounds simple? It is simple.

The real use of it is probably only when you have a bigger site with many users and traffic is big, so it is not really necessary for small websites that barely use a connections and have under 50 visitors a day. The singleton helps a lot on each user page load, when the connection is preserved and reuses for the page load, but still, using a persistent connection could slightly improve speed even thou it is probably not noticeably. The real problem solved with persistence is that you can have many more simultaneous connections to the database – getting rid of the connection overhead problem.

How to do it?
I use the mysqli extension for PHP, and the only thing needed is to add a ‘p:’ to the host address, like this:

$db = new mysqli('p:localhost', 'username', 'password', 'databasename');

This only works with PHP 5.3.0 or higher, so be sure you have it both locally and on the server. I you don’t have it or don’t use mysqli, then there is an alternative, to switch form mysql_connect to mysql_pconnect.

That simple, really – and no other change to code.