-
TikTok architecture
-
Functional requirements
- Upload videos
Store videos
Feed of videos
-
Non-functional requirements
-
High availability
Low latency
- to upload (low SLA)
fault tolerant
-
10M DAU viewers
100k uploaders
- write to read: 1/100
- Few ideas!
- Utilising the fact that most requests are of videos that are in trend, and trends die in ~month or so, instead of storing all the transcoded files, we have a live transcoder, and store the result in a cache (or CDN) with a TTL of ~ month (this time can be decided by data analysis). Twitter did this and were able to save millions on storage costs.
- We can have live websockets with the online users, so that whenever the video is complete we can notify them, and maybe also the users who were tagged, or are very engaged with an account.
- Instead of dividing videos in chunks after receiving the whole video, let the client do the chunking and upload chunks only. This would result in way less failures as if a upload fails after uploading 95% of the video, you don't need to re upload the entire file again.
- Maybe have caches on top of databases
29
Reply
2 replies
Vikash Sharma
Vikash Sharma
4 months ago
s3 also have multiple tiers . you can set the rule to move files to lower tier after set time and further
1
Reply
Kevin
Kevin
1 month ago
Agree with chunking the video on the client side!
-
I want to see all of Akash's video
-
request goes Video serving service
- check which user
User database redirect to Video metadata database
return list titles and thumbnails to the user on the first go
As soon as the user clicks a video, then video id is sent to CDN that delivers the content
service takes care of Hot videos - directly pulling it from cache
-
Service 1 - ingest
Tiktok
(monolith)
-
upload API
Meta data (user data)
actual video)
- which protocol?
HTTP
HTTPS - TCP - audit also
FTS
-
This service will store data in the database
but since we will have a lot of requests - 1 upload per day by a user - 100k requests a day, therefore we will use a queue
-
checks
- less than 60 seconds
- appropriate or not (adult)
- we will send the user an ACK 202 and queue the request (events that will be subscribed to by the workers)
- convert to different formats
1. Divide the file into chunks depending on duration (1 chunk - 10 sec)
2. converting the chunks into different formats
parallel processing (pushing it into workers)
3. These formats are then again processed to form different resolutions
4. we'll combine these files to result into 16 different files (started with 1)
5. Upload this file into different S3 regions
- Depending on the demographics of the user, we will replicate the files in the regions
- S3+ CDN
- Fault tolerant - if one availability zone/node is down, then because of replication, we can redirect the request to some other region
-
Service 2
Tiktok
(monolith)
- Serve the video
- locate the user
- redirect the request to the latest availability zone
- has to take into consideration the network bandwidth of the user, the device of the user, the format supported by the user
- + recommendation
Caching will also be available at CDN
CDN - example AKAMAI, OpenConnect (in Netflix)
we want 99.99% availability, the company is providing 99.9% - tradeoff is cost, resources?
-
Different formats?
- Resolution, ios, android, desktop
- 1 video - 200 devices, 4 formats -> 800 formats
- additional 3 formats -> 2400 files for 1 video
- Storage?
- 100k uploaders, sporadic - 200k videos per day
- 1mb per video
- 200GB - multiple formats - 4
- ~ 600Gb
- different formats (resolution) 2x - 600*2 = 1.2TB per day
-
- we will send the user an ACK 202 and queue the request
-
Video storage Database
-
Video Data
- Amazon S3
High load
Easily accessible
Reliable
AWS infrastructure
Tie S3 to different CDNs to allow serving to different users in different regions - reduce network latency
duplicate S3 buckets in different regions
File storage - non-mutability
if in case the use-case arises to allow users to edit the videos, then modification will be required
-
User Data
- MySQL
- recommendation system
Asset properties - Join
-
Video Metadata
-
any key-value storage like Mongo
Redis - not be very useful
- won't be as relational or organised as the user data
would just be attached to the user and the video
flexible - you can edit the video meta-data - delete the video, when was it uploaded, change the thumbnail, change the caption
faster for access
user history, recommendation will frequently access it
horizontal scaling is easy