This proposal is a request for funding to pay for the work done so far and in the near future for development of the image server cluster that provides images (pictures) to hive.blog, peak.com, and other Hive frontends.
Explanation of costs
In addition to the development work, this proposal includes the charges for servers to host the images for three months (until this proposal expires).
After this proposal expires, we’ll make a separate proposal at a much lower rate that just covers the costs of the servers and server maintenance over a longer period of time.
It should be noted that most of the cost of this proposal is for the initial development work and one-time computer purchases (for backend fileservers). Roughly speaking the costs are: $50K in labor, $10.5K in new computer equipment purchases, and $2.5K in hosting charges (a lot of that is network bandwidth-related over a 3 month period, previous month + next 2 months) for a total of $63K.
For simplicity, we’re making a 63 day proposal, that pays out 1K HBD per day.
Goals of Image Server cluster design
Our immediate goal was to get a replacement for the Steemit image server as soon as possible, as we anticipated that Steemit would likely block access from Hive-related services (which they ultimately did, although they took longer than expected to do it).
Our other near-term goals were to reduce the long term cost of operating the image server cluster and create infrastructure that was resistant to single-point failures.
Image Server cluster components
In it’s simplest form, we could have used one computer as our image server, but such a design isn’t scalable as the number of images that need to be stored increases, and the amount of times those images have to be fetched increases with more users. The final solution we came up with is quite a bit more complicated, but it allows us to easily scale as needed.
The picture below shows the various servers that compose the Image Server Cluster:
Haproxy (High Availability Proxy) is used to distribute load and detect server problems
We use haproxy at various points in the image server cluster to distribute requests to downstream servers. Haproxy automatically detects when one of the downstream servers malfunctions and distributes load to the remaining servers. It also allows us to easily offline a server or add in an experimental server for testing purposes.
Frontend image cache server
Our frontend system is a cloud-based server using Varnish that caches the most recently accessed images in memory. If an image isn’t in the cache or if a user is posting a new image, then this server passes the request on to one of the imagehosters.
Imagehost servers
An imagehoster handles fetching images from the web and from our fileservers, adding new images to the fileservers, and resizing images for thumbnails. Currently we have two imagehost servers, just for redundancy, as a single imagehoster could easily handle the current load.
Fileservers
A fileserver stores the image files themselves (including thumbnails). The imagehoster was written to store images in S3 (amazon’s cloud-based filestorage system). To save on costs, we didn’t want to use S3 to store the images, so we’re currently storing the images on freenas file servers, using minio to support the S3 api. We currently have two fileservers for redundancy, and @gtg is working on adding a third fileserver as part of his infrastructure. Currently all requests are served from one fileserver, as the load on the fileserver side isn’t very high yet, and the other servers are just there to avoid issues if the main fileserver fails.
Remaining work: improve the fileserver architecture and tune frontend image cache
We’ve had some problems using minio’s synching facility to keep files mirrored between our pair of fileservers, so we plan to explore other options for storing the images. Two likely options are CEPHs and glusterfs.
We also plan to do more analysis of how the frontend is caching images and work to improve it’s performance. This may involve replacing the current caching software, increasing the size of the memory cache, or paying for a CDN (content distribution network) service.
How to vote for this proposal (#105)
Below are several interfaces that can be used to vote for this proposal