It actually could have a reason for it though.
For example each ignored user adds a record into the database.
So each user of a large forum may have a 1000 database lines just for the ignore list and another 1000 just for the follow list in addition to anything else.
The forum has 261,851 members.
If all of them had 1000 ignored users, that is 261,851,000 database records.
First off,
if I'm wrong, I want to know I'm wrong.
OK. You have a point, Generally, I'd expect a database table to have a unique identifier for each user, and the structure for ignored users could be as simple as a table of records of integers where each integer is of a known size, like an 8 bit int or a 16 bit int or a 32 bit int. Each one takes a known size to store and predictable amount of time to fetch in standard SQL. For example, if we used a 32 bit int, that would allow
4,294,967,296 unique values to be created so I assume we would not have that many users created in the lifespan of the site. Or we could just use a string of a known length as the type of the data for each record. We can use text encodings of UTF-8 (2 bits ber character) or UTF-16 (taking 3 bits per character to allow for things like Russian cyrilic text). Since I assumed we are just using numbers, we can just use UTF-8 encoding. If you're still with me, I'm just trying to show that I am aware of what we need to be cautious about. That said,1000 records is nothing.
I've done tests like what we are talking about 10, 20 years ago, but let's get current.
As I thought about this, I remembered that I was a lead on the database migration for a large healthcare company out of California for the entire state of Iowa. This database has many databases in it and a total of 2 million+ users. You saw above the detail, rigor and care I took when considering how to create the fastest most space efficient data structure for the table's records. You and I am speaking the same language.
That said, the table record values in this uber database of multi databases (9 I think), each containing many tables, simply used flat out plain old text records as their internal data type for almost everything. No attention at all to optimization for size and speed. None. Now, get this. We had a mirror of the production uber database that was good enough to the current one with all unique customer data replaced. A crapload of records. We could all run it locally on the computer we were working on, a consumer laptop. Our Mac or PC laptops with an SQL server app. Or on another server elsewhere. No special equipment needed. And we just pointed our app at it. And it was fine with regards to performance.
Back to your point, yes! And thank you for the detail. I needed to know that before making asumptions for now and for what the user count will be in the future. The next step is to know how many users are online at once and how many queries are there coming to the load balancer at once. My assumption here is the exclusion of values is being done as a filtering database operation at the time of the query, like so.
SELECT columns FROM table WHERE test_column [NOT] IN (value1, value2,...);
That is the rough SQL. This is a common operation in databases. I use it every day when my Netflix show list loads because of all the shows I have added to the parental control filter. But there are other ways to look at it. One of us could code a rough test up ourselves (naah) but ZooVille's website runs on Xenfora. Xenfora offers a developer install!! One that lets a customer install their own SQL server and web server. Either way, we
could try out either option and auto generate test users and increase limits to see if we can make it fail. We
could.
Or we could ask Google and Chat GPT.
what is a reasonable limit to the size of an SQL filter
or
creating large sql filters
The links returned give us insight into related issues.
If I remove one value from the IN clause the query succeeds after 24 seconds. For this incredibly simple query, 32767 is the maximum number of values that still allows the query to execute.
I see a hard limit on 32767 values but this suggests a way.
Every SQL batch has to fit in the Batch Size Limit: 65,536 * Network Packet Size.
You already found Erland's article on the topic of passing lists/arrays to SQL Server. With SQL 2008 you also have Table Valued Parameters which allow you to pass an entire DataTable as a single table type parameter and join on it.
Not assuming a join, we could have this.
SELECT columns
FROM table_xxx
WHERE desired_guids NOT IN (SELECT exclusion_ID FROM ignore_table);
And viola!
There are options out there and room for improvement. IMO, 1000 values seems small. I may need to try it ib SQL for real.
Fun. Let's keep talking.
If I'm wrong, I want to know I'm wrong.
Cheers