PostgreSQL, AWS, and Musical Bottlenecks

I have had the misfortune of working with PostgreSQL for the last 8 months. Working is a relative term, for me little work has been done mostly I’ve been kicking off queries waiting forever fo the returns and then trying to run down the bottleneck.

I am not a Linux professional and have to rely on those professionals to diagnose what’s going on with the AWS instance that runs PostgreSQL 9.3. Everyone who looked at the situation has had a different opinion. One person looked at one set of performance data and said the system isn’t being utilized at all, someone else would say it’s IO bound, still someone else would say it’s the network card… So we wnet through all these suppositions added more RAM, then more processors, then we used the SSD drives more, finally switching from Non-provisioned IOPS to Provisioned IOPS got the system roughly as far as we could push it to where the complex queries would drive one CPU Core to 100%.

Now those of you who work with read enterprise RDBMS might say, “Wait… One CPU core reached 100%?” Well yes, of course, because you see PostgreSQL does not have parallel processing. Yeah…

No matter how many CTEs or sub queries present in a query statement sent to PostgreSQL, The processing of said query will happen in a synchronous, single threaded fashion on CPU core. I’m thinking SQL Server had parallel processing in the late 90’s or early 2000’s? It’s 2014 for crying out loud.

And it gets better! According to my observations, the Postgres process is also single threaded. This process is responsible for writing to the transaction logs. So there isn’t any benefit to create multiple log files for software striping and efficient log writing. In fact, one big insert seemed to back up all the smaller transactions, while the first insert wrote to the transaction log.

This is one of the joys of Open Source offerings. If the development community doesn’t think a feature is important you have to fork the code and write the feature yourself. What blows me away is that companies are willing to gamble the success of their products and implementations on something so hokey.

Open Suck… I mean Open Source

If you’re reading this for a socialist country, I’m sorry but you’re going to struggle to understand the basic premise of this discussion. The application of a common cliché in capitalist societies, “You get what you pay for” I believe is universally appropriate. From my father-in-law, who bought the cheapest satellite service and complains incessantly about how much he wishes he had the same cable service I have but is unwilling to pay the higher service charges, to out sourcing call centers to regions of the world that speak a different language than the users of this service, to booking a cheaper hotel near the Orlando amusements with free shuttle service that’s just a glorified, overcrowded city bus without the graffiti. Going cheap is almost always going to disappoint. But this is a technical blog and my focus is Business Intelligence.

I’m working on a favor for a friend and I wanted to take this opportunity to explore some new technology. This friend of mine doesn’t have any budget for this project so I’m looking for cost effective components for this application that’s simply client front end to an RDBMS. My friend runs a small collection of Windows 7 desktops, I love Entity Framework, I’m proficient in Visual Studio, and I don’t need a “Big Data” solution. So I start thinking Open Source. Alright, hurdle 1, I’m not a java guy, and some of you might start harping about how Ruby, Rails, PHP running on Apache, Beans and Java all vastly different things…. I’m not into any of them; they’re all Java to me. A lifetime ago I played with swing and it sucked on Windows. Most Java apps I see run in Windows, are crap.

I don’t want to go into an in depth discussion on all the options, but I decided to investigate PostreSQL based on a recommendation from someone in my network who swears by it. One of the things I liked is the multi-OS support. Just in case the world turns upside down and I want to install the database one something other than a Microsoft OS, I thought I’d work with an RDBMS that would work the same no matter where it was installed with ne common client. The installation was smooth enough. I installed everything and clicked next, next, next… no errors. Good. Then I started researching ADO .NET clients to support Entity Framework, that’s where the wheels fell off.

In the realm of free providers to go with the free RDBMS; there is an OLEDB provider pgnpoledb, multiple JDBC drivers, and one ODBC/.NET provider npgsql. Now, I’m skeptical man and before I went down the path of actually trying to connect Entity Framework to the PostgreSQL database I decided to read the npgsql wiki. Pages were devoted to all the different issues and bugs, what was or wasn’t being submitted for acceptance in GitHub. From the headache mounting on my cranium, I could tell this option was going to require maybe a bit more effort than I was willing to invest in a favor for a friend. A lot of posters were pointing to the .NET provider for PostgreSQL from DevArt. Long story short, $199 for what I wanted… Wait a second I thought this crap was all Open Source and free!

Let’s just explore this concept, which has long been my complaint with the Open Source stack. If your goal is to create a mission critical high availability enterprise application with the Open Source offerings, you must be prepared to not only code your application, but also the platform on which it runs, or abandon the “Potentially Free” benefits of Open Source by purchasing licensed products to augment and stabilize the Open Source platforms. Option 1 means roughly doubling your workforce or your time to market. You need resources to code the platform and resources to code the application or resources that do both, but really only one at a time. Option 2 cuts into your equipment and tools budget and you need to verify what the vendor’s royalty and redistribution requirements are. No one wants to depend on a component that requires $1000 royalty for every user on a 40,000 seat client server application, right?

There are other Open Source challenges I love to joke with the diehard apologists I know. Like the fact that your favorite platform was written by one talented foreigner who doesn’t speak your language and only responds to email questions once a week when the internet service satellite flies over his bunker. I like a challenge as much as the next person, and I sympathize with the desire to revolt against the powerful software companies that are so slow to accommodate user needs. But, I’m just not willing to chance providing a service, where contractually I have to pay a refund for every minute of down time, dependent on a platform that was developed by hobbyists and amateurs.

Look at the example I stated above where the free provider has lots of challenges and the paid one is stable and supports all features of the toolset it’s meant to service. Developers whose livelihood (paycheck) is dependent on the successful execution of a project are naturally going to be more motivated to generate a better product than those who are working merely to support a community. Likewise, those tasks that facilitate the collection of said paycheck will take priority over the needs of a community, which leads you to have more down time as you wait for someone to get off from work (or high school marching band practice and homework) to fix a bug in the platform your product depends on and publish it to GitHub.