Massive content archive in the works

WildDog

Tourist
Hey all~

I'm currently working on a massive project to store, sort, and condense all the photo and video content I have regarding zoo & beast topics, using all the siterips and compilation torrents I've come across in my years. I intend on making it accessible as possible once I eventually finish making it.
I say eventually bc there are thousands upon thousands of photos and videos I'll be manually sorting through, but I haven't even gotten to that stage yet. Currently, I'm using several pieces of software to detect and choose between sets of duplicates in the collection I have. There's 500+ gigabytes of media here so this part of the process is one of the most tedious, given that most sets of duplicates are of the same resolution with either minor quality differences or small watermarks that differentiate between them. I'm not settling for automatically processing them though, because I want this collection to only include the highest quality available for anything in it. I'd crowdsource this process if I could but I don't have the resources available to host and distribute this much data, plus it may get me in some form of trouble I'd imagine.

Regardless, I've been working on this on and off for some months now and I'm at the point where I feel like it's worth mentioning. It contains pieces from almost any species (that has media of it to begin with at least) and is sorted by type of content. Plain zoo (only animals) is separate from beast content because I've found that some have preferences for one or the other oftentimes. Anyways, within each, content is sorted based on visual content, sex, actions, and other things.

Feel free to ask if you have any questions or suggestions, because I have a lot of time to go before this is complete and I'll probably adjusting things here and there for a while. Also, if there's a desire for anything within it for the time being, I may make a request thread on the "Other" board since there's more than just horses and dogs in the archive~

I also have a telegram channel where I'll post updates as well, if people want to just have that for info. It's private so nobody can see who's joined it as well.
 
Last edited by a moderator:
Interesting. How is it going to be accessible when completed?
Also I think you will piss off a large amount of content creators who generally do not like seeing their content stolen and posted all over the internet. But at the same time it is absolutely inevitably going to happen sooner or later deliberately or by accident if you post anything valuable. It is just how the internet works.
By distributing porn in large quantities, you are probably breaking the law severely. So be careful.
 
Interesting. How is it going to be accessible when completed?

This is the one thing I'm /really/ not sure about, to be completely honest. I may use it as a personal archive to help satisfy request threads, or simply allow trusted ones to copy it when meeting up physically, because beyond shipping a hard drive with everything on it or spending money to host everything (and yes likely get in trouble), there probably isn't a good way to go about it.
I'm certainly open to suggestions either way.

Also I think you will piss off a large amount of content creators who generally do not like seeing their content stolen and posted all over the internet. But at the same time it is absolutely inevitably going to happen sooner or later deliberately or by accident if you post anything valuable. It is just how the internet works.

Yeah I wouldn't doubt it. Though I will say I won't ever be trying to make any money off of the project. It started as a small thing to help share content to friends with, but as the collection grew, so too did the ambition to make it as good as it could be.
I won't be actively finding paid content in order to freely distribute it either. I'm only putting in what I have readily accessible to me. So if something's online already, it wasn't me who did the leaking, and I wouldn't know about it if it was leaked content anyways.
 
One thing in particular I've come across in this process is the presence of some people who will blatantly take content that isn't theirs, and throw their own watermark onto it. Thankfully for most of these I've managed to have the original image on hand, and one notable case is someone that went by "Cete", who focused mainly on big cats. They would have a large set of images and occasionally videos that were often not even the highest resolution available, all with some form of a fairly large multicolor watermark smacked in the corner.

It's not that big a deal in the big picture, but it is mildly annoying that someone would do so much just to make it look like they were the source. As much as I dislike the old BF automatic watermarks placed on images, at least it make things clearly credible, and without decently noticeable editing it couldn't be removed. The center circle was always shit though.
 
Almost all of the photo dupes have been removed as of tonight. A very small fraction of the dupe sets remain due to software limitations, but it's such a manageable amount that I'll just handle them as the sorting process happens. For now though, I move onto checking video files for duplicates. Per-video this takes more time to check for obvious reasons, but there are nowhere near as many as there were photos, so overall this part is going to be much less tedious. I went through the remaining 9k photo dupes just today so I'm gonna give myself a break for a day or two, maybe. Certainly done for tonight at the very least~
 
I've gone through all the duplicate videos my software has been able to detect, and now I'm properly planning the sorting process so I can move forward smoothly. I've made a flow chart for the non-human content half of the collection, and I'll be working on one for the beast content half later on. I'm still quite unsure of how deeply I want to sort beast content, as there are often photo sets or videos that span multiple categories of content. It's something I intend to ask about and change per others' feelings and suggestions once it's initially made anyways. Better to get a good organizational structure set before sorting everything.

Anyways, here's the basic structure for the animal-only portion of the collection. Some extra content folders may appear where appropriate.88B0A6AF-641F-4692-92D4-65F191F1102A.jpeg
 
I've gone through all the duplicate videos my software has been able to detect, and now I'm properly planning the sorting process so I can move forward smoothly. I've made a flow chart for the non-human content half of the collection, and I'll be working on one for the beast content half later on. I'm still quite unsure of how deeply I want to sort beast content, as there are often photo sets or videos that span multiple categories of content. It's something I intend to ask about and change per others' feelings and suggestions once it's initially made anyways. Better to get a good organizational structure set before sorting everything.

Anyways, here's the basic structure for the animal-only portion of the collection. Some extra content folders may appear where appropriate.View attachment 130291
Looks about right
 
This is the one thing I'm /really/ not sure about, to be completely honest. I may use it as a personal archive to help satisfy request threads, or simply allow trusted ones to copy it when meeting up physically, because beyond shipping a hard drive with everything on it or spending money to host everything (and yes likely get in trouble), there probably isn't a good way to go about it.
I'm certainly open to suggestions either way.
To be honest I do not exactly understand why you go through the whole process of presenting your efforts here when it is basically going to be useful mostly only for yourself. :D I obviously do not by any means want to force you to share it, but to me it looks like mostly nobody else will get to use the collection.
 
To be honest I do not exactly understand why you go through the whole process of presenting your efforts here when it is basically going to be useful mostly only for yourself. :D I obviously do not by any means want to force you to share it, but to me it looks like mostly nobody else will get to use the collection.

Yeah I feel the same a good amount of the time, but at the same rate, it's a fairly large project for myself, and I feel like at the very least posting progress helps motivate me to keep working at it. Aside from that, since it's regarding this content I feel like others here may have good ideas or general input that could help me~

In a perfect world though, I would be sharing this as much as possible once it's done. But due to both technological and legal restrictions, it'll probably only get copied directly from person to person, if it ever gets copied in full. Eventually I do intend to meet others I trust in this community, so that's not too much of a stretch imho. Worst case scenario, I have a resource at the ready to satisfy for a good amount of request threads
/shrug
 
Doesnt matter....post my stories and Ill find a way to make you regret it, friend. I dont care where you got the content, Im pretty sure it wasnt yours to begin with. Your "collection " is of other peoples work....dont you think they have a right to be consulted?

Absolutely. And if anyone wanted me to remove content from it I'd gladly oblige. As it stands, the content present is all old and already has been out in the wilds of the internet for some time, so I don't see any moral objection to holding onto it. And distribution aside, it's convenient for myself to have a system like this for my own usage, so I could easily have content in here that is sorted and stored nicely but isn't to be shared. Adding a note to any set of items would be very easy to do.
 
I'd like to add that I would never post content that isn't my own and try to pass it as such, that's low and never something I encourage. Where present, I'll always be more than happy to give credit to the original source.
Also, no need for unwarranted aggression fam!
 
If you were to use a system like this, would you prefer less fine sorting so image sets could be kept together, or finer sorting where some image sets may be broken up?
I'm on the fence with this one myself, I'm curious if anyone else may have thoughts on it
 
I am going to be interested mainly in the solo animal content and specific human-animal content.
 
So what exactly is this project? Is it a file that holds a lot of content, an app, website? And when do you think it will be finished or when did you start and where are you now with it? Very curious and I as everyone else appreciate the effort you are going through to provide this to the community. ❤️
 
And when do you think it will be finished or when did you start and where are you now with it?

I started it in some form a couple years back when I started downloading and sorting stuff I found and liked. But this iteration of the project with actual intention and proper planning was only started a few months back, and started with the multi-day process of downloading the old bf siterip (which only really contained canine content anyways). I also started consolidating allll the miscellaneous thingsI had saved over the years on the different backup drives I've had, including all of the content from unicorn(dot)wereanimal(dot)net if i recall the site properly. I haven't tried to find it any time recently since it was down for a period, and completely stagnant for years before that anyways.

Tangent aside, that's where it started, and as for when it'll be finished, I can't realistically give a good estimate of everything yet. I'll try and timelog some sorting sessions to get a general feel of the speed, but there are hundreds of thousands of files in this initial collection, so it's going to be months at the shortest. I'm only able to take some free hours during nights to make progress because I have other work I have to tend to. Also, presently I'm actually working on getting all of my original content edited and marked up first. That should only be a week or so more to finish that up however, there's a lot to get through and filter to get the best without too much repetition of content.


So what exactly is this project? Is it a file that holds a lot of content, an app, website?

It's a big fat hard drive that I already need to upgrade because there's not enough room. It's on a 500g but it's juuuust over capacity by about 60, so I'm gonna shove it all onto a 1tb once I get the chance. The drive I'll be using is temporarily being used to backup a separate system of mine while I clean wipe the OS on the computer, it had gotten viruses a couple years back and yadda yadda factory reset.

As for what I actually want it to be, I'd love for it to be an accessible virtual drive of sorts, but as others have brought up and I had also considered myself, there are big legal worries with that route. So, of your three, its a big ol' file with a bunch of content. If I ever meet up with my close zoo friends, I'd have it with me to clone in part or in full.


Very curious and I as everyone else appreciate the effort you are going through to provide this to the community. ❤
Thanks much!
It'll be a fairly slow haul from here in terms of progress updates, seeing as the rest is just a process of moving files into the right places. But I'll always be active here if any interesting developments come up, or if questions arise of course~
 
Lol I'm doing the same thing with a collection that is broadly the same size XD I've came up with a slightly different content pattern - species of animal, sex of human, human on animal/animal on human - but bady sorted yet. On the other hand I don't have it on a single hard drive but built a scale-out storage cluster so single hard drive failures are no problem and know how to share it anonymously over tor - though it takes about 2 month to download mine. Would like to proceed to a tag database over time and already made a database of checksums and filenames. Maybe we could exchange a list of checksums to exchange the content each other is missing ? XD
 
Lol I'm doing the same thing with a collection that is broadly the same size XD I've came up with a slightly different content pattern - species of animal, sex of human, human on animal/animal on human - but bady sorted yet. On the other hand I don't have it on a single hard drive but built a scale-out storage cluster so single hard drive failures are no problem and know how to share it anonymously over tor - though it takes about 2 month to download mine. Would like to proceed to a tag database over time and already made a database of checksums and filenames. Maybe we could exchange a list of checksums to exchange the content each other is missing ? XD

That's not a bad idea. I haven't used tor myself but it is something I could look into for more safety. That said if it does somehow still get found out the trouble is still there.
Either way once I do have it sorted and such I'd be happy to generate checksums for what I have and compare~
 
I gotta say I hadn't thought of using an actual database to aid with this project but that's a great idea. I'll have to get a local sql server running for it eventually. Having the ability to have hidden metadata like specific species and keywords would be neat to do too
 
Thanks! I'm glad I've finally put myself on the path to making it rather than just thinking about it~
All it takes is that first step!!! Im rooting for ya, hopefully your compiled videos become as popular as gaybeast once was.
 
I gotta say I hadn't thought of using an actual database to aid with this project but that's a great idea. I'll have to get a local sql server running for it eventually. Having the ability to have hidden metadata like specific species and keywords would be neat to do too

Actually I use a simple sqlite database right now and just have a script that checksums a folder and puts the checksum and filepath/name in that database - also a tag and hash/tag table though those are not used much. When I have sorted some more I could generate tags out of the paths initially. And setting up a real database wouldn't be much trouble either- I mean apt install postgresql and some slight changes XD
also my collection is on a luks encrypted filesystem on my storage cluster so it can't easily be used as evidence if I get to break electricity during a potential raid. at the moment I just share by using an apache server with a vhost for a hidden service that points to the encrypted device - but that would be better as a vm or container that opens a connection to the storage and could also be encrypted. It would be really interesting to have that storage replicated somewhere else and use a tor load balancer for access and maybe nodes to cache some stuff that gets accessed often... but I'd have to get into the mood to build something - I'm kinda lazy XD Maybe would also be worth looking at how the permanent booru works and distribute the files over i2p - but have no experience with that yet. Might also be a security consideration to have any vm that actually answers to a hidden service address in a virtual network that simply has no access to the internet but over a tor relay so it simply can't ever leak your ip address even if a server gets hacked. But such stuff is always such a hazzle to setup and hacking a server that uses just apache and indexes like I do at the moment seems pretty off.
 
Actually I use a simple sqlite database right now and just have a script that checksums a folder and puts the checksum and filepath/name in that database - also a tag and hash/tag table though those are not used much. When I have sorted some more I could generate tags out of the paths initially. And setting up a real database wouldn't be much trouble either- I mean apt install postgresql and some slight changes XD
also my collection is on a luks encrypted filesystem on my storage cluster so it can't easily be used as evidence if I get to break electricity during a potential raid. at the moment I just share by using an apache server with a vhost for a hidden service that points to the encrypted device - but that would be better as a vm or container that opens a connection to the storage and could also be encrypted. It would be really interesting to have that storage replicated somewhere else and use a tor load balancer for access and maybe nodes to cache some stuff that gets accessed often... but I'd have to get into the mood to build something - I'm kinda lazy XD Maybe would also be worth looking at how the permanent booru works and distribute the files over i2p - but have no experience with that yet. Might also be a security consideration to have any vm that actually answers to a hidden service address in a virtual network that simply has no access to the internet but over a tor relay so it simply can't ever leak your ip address even if a server gets hacked. But such stuff is always such a hazzle to setup and hacking a server that uses just apache and indexes like I do at the moment seems pretty off.

I'll admit I work more with software than database and network dev shit, but I believe I followed most of that haha. These are things way down the line for me anyways so I have time to brush up~
 
Back
Top