Building Designing the Data Model


In the last post, I talked about how Richie and I sketched out the goals of, the site where you can post questions for other folks to answer during their live streams.

I challenged you to decide:

  1. What data needs to be stored to make all this work?
  2. What database or persistence layer should we use?

So let’s talk through each of ’em.

1. What data needs to be stored?

The 3 major components are:

  • Rooms – with attributes like the room name, URL, room owner email
  • Messages – including the message, person’s name, person’s email address. Initially, we didn’t want email address, but we added it and made it optional just so we could show Gravatar icons next to peoples’ names.
  • Votes – and this is where your opinions will play out in the comments. The data professional in you wants to capture the details of every vote, and you might want to add a separate normalized Votes table for it. However, for the sake of this app, we don’t need granular tracking of votes. We just need to increment the number of votes on a message – and for that, it makes sense to just have a VoteCount column on the Messages object. I know. You’re not happy about that. But remember, we have a goal to track as little data as possible.

Here’s one of the early data model sketches from Richie’s notebook:

Richie’s sketch adds a fourth data object, connection, that’s required for tracking each of the browsers connected at the moment. We don’t need that for evil purposes – we just need to push updates back to them so their screen can refresh in real time.

Just for clarity’s sake, this isn’t the exact structure of PollGab’s current data model – but I like showing this early draft because it shows the thought process. Data models change over time, especially when you’re building a brand new app like this one where we can iterate quickly, and we don’t need to deal with queries coming from other systems, like reporting tools.

There are additional things to think about as well, like banning users and making payments, but I’ll keep it simple for the sake of this discussion. (And really, in a modern app, you don’t want to hold the payments yourself – that’s what services like Stripe are for.)

In terms of the way we load and query this data, here are a few sample use cases:

  • View the list of rooms here
  • View the list of messages for one room (like Office Hours)
  • Insert a new message
  • Upvote a message
  • Flag a message
  • Delete all messages in a room

You’ll notice that the views are fairly tightly scoped: we’re not doing things like analytical reports.

2. What database or persistence layer should we use?

I know, most of us here use the Microsoft data platform to make a living. However, dating back several years ago when Richie built, then when he built SQL ConstantCare®, we’ve long believed that where we can avoid using a relational database, we should.

NoSQL systems are generally cheaper and easier to manage than relational databases. Yeah, I said it. Come at me. Try hosting production-quality, auto-scaling relational databases for $3.43 per month. It’s damn hard.

So like PasteThePlan (but unlike SQL ConstantCare®), we ended up using Amazon DynamoDB. I know it’s marketing, but the main boxes on the DynamoDB product page do a pretty good job of summing up its advantages:

DynamoDB does have drawbacks for most of us in the audience: we can’t just whip open SSMS or Azure Data Studio, write a query with joins and grouping, and get reports out. However, for the sake of this web site, we just don’t need reporting across multiple rooms or questions.

During the design phase, I specifically said that any kind of marketing analytics were completely out of scope. I recognize that ad agencies and podcasting companies might want to do centralized purchases for lots of podcasts or rooms, and then get analytical data to know which users are asking the most questions, or what kinds of questions get the most upvotes. They’re just not our target market for the initial product release.

This is a cloud-native, locked-in app.

We designed this from the ground up to work exclusively with Amazon DynamoDB. This means we’re stuck there, for better or worse. If Amazon DynamoDB goes down, PollGab’s going down with it. As a small business owner, that’s a risk I’m totally willing to take. DynamoDB has much higher uptime than any server I’ve personally been involved with building and maintaining.

I’m sharing this with you because I want you, dear reader, to understand why startups and small businesses might choose to go cloud-native and build their applications without traditional relational database systems. I love databases! I still absolutely love ’em. But I’d rather not be on call for fixing them when (not if) they break.

I’m also being fine locked into AWS because we chose to keep building with Amazon Lambda, the same hosting and processing architecture that we chose for PasteThePlan and SQL ConstantCare®. Lambda lets you write short-lived functions, and you don’t have to know anything about the operating system or servers that run your code. When someone hits a URL, the code runs, period. When lots of people come in, AWS will gradually spin up more… somethings… that run your code. That’s just not really our problem.

Could we have used Azure or Google services instead? Absolutely! But at the time we started building this, Richie had already built up 5 solid years of experience with DynamoDB and Lambda, so we were able to start building more quickly without learning new tooling.

Could we have used a conventional server-based application, like a web site in IIS or a .NET app? Absolutely! But we wanted to be able to completely outsource uptime management and troubleshooting to Amazon, and that’s where Lambda makes things easier. The best evidence for that is that as Richie continues to build stuff, he’s able to spend an absolute minimum of time managing our already existing apps like PasteThePlan and SQL ConstantCare.

What questions do you have about PasteThePlan’s data model or persistence layer? Let us know in the comments.

Previous Post
Building Design a Database for Live Session Questions
Next Post
Who’s Hiring in the Database Community? March 2022 Edition

9 Comments. Leave new

  • Interesting peek behind the curtain. I am curious though what are you using to host your website? Is that something that can be done from Lambda? Pardon my ignorance, I haven’t made it down the cloud rabbit hole very far.

    • The www site is WordPress, and the site is AWS Lambda:

    • I’m on roughly a similar backend for db and api for a personal project.

      For front end I have cloudfront serving s3 hosted react html/css/js coming out of npm build

      You can run express on a lambda function but it requires some work around a to enable express to route the requests. App Runner or fargate starts looking more appropriate if you don’t want to use aws api gateway for linking http path to lambda function.

    • Richie Rump
      March 3, 2022 4:38 pm

      S3 holds all of the “webby” files like (HTML/JavaScript/Images). You can configure S3 to be the “host”/web server. Lambda does all of the server-side processing. There are a ton of different services in AWS so you’ll need to do some homework on what they’ll do and how they’ll help your app.

  • What kind of Lamba service are you using? The Stream process or the Web applications or?

  • 1. Consider implementing a mechanism to block vote-fraud (voting same item multiple times). Perhaps using Local Storage.

    2. provide host options to:
    – post “comment instructions” (for example, indicate what type of questions should NOT be asked).
    – limit max length of a comment.
    – limit comment participation to only guests with a VERIFIED email address.
    – post multi-choice questions and display answer distributions (reuse vote logic)

    3. Provide guests options to:
    – view/edit/delete their own comments
    – view top-voted comments (to avoid posting similar comments)
    – mark their comment as “Private”
    – vote on presentation pace (faster/slower)

    • Hi! It looks like you’re asking for feature requests. This isn’t really the place for that – we’re discussing the data model here. Cheers though!

  • Nick McDermaid
    March 2, 2022 12:59 am

    I completely agree that you do not need a vote transaction table. YAGNI. KISS all that stuff. Don’t over complicate. It’s a good idea for all solution designers to be on their guard against over-engineering.


Leave a Reply

Your email address will not be published. Required fields are marked *

Fill out this field
Fill out this field
Please enter a valid email address.