1. TikTok architecture
    1. Functional requirements
      1. Upload videos Store videos Feed of videos
    2. Non-functional requirements
      1. High availability Low latency - to upload (low SLA) fault tolerant
        1. 10M DAU viewers 100k uploaders
          1. write to read: 1/100
    3. Few ideas! - Utilising the fact that most requests are of videos that are in trend, and trends die in ~month or so, instead of storing all the transcoded files, we have a live transcoder, and store the result in a cache (or CDN) with a TTL of ~ month (this time can be decided by data analysis). Twitter did this and were able to save millions on storage costs. - We can have live websockets with the online users, so that whenever the video is complete we can notify them, and maybe also the users who were tagged, or are very engaged with an account. - Instead of dividing videos in chunks after receiving the whole video, let the client do the chunking and upload chunks only. This would result in way less failures as if a upload fails after uploading 95% of the video, you don't need to re upload the entire file again. - Maybe have caches on top of databases 29 Reply 2 replies Vikash Sharma Vikash Sharma 4 months ago s3 also have multiple tiers . you can set the rule to move files to lower tier after set time and further 1 Reply Kevin Kevin 1 month ago Agree with chunking the video on the client side!
    4. I want to see all of Akash's video
      1. request goes Video serving service
        1. check which user User database redirect to Video metadata database return list titles and thumbnails to the user on the first go As soon as the user clicks a video, then video id is sent to CDN that delivers the content service takes care of Hot videos - directly pulling it from cache
    5. Service 1 - ingest Tiktok (monolith)
      1. upload API Meta data (user data) actual video)
        1. which protocol? HTTP HTTPS - TCP - audit also FTS
      2. This service will store data in the database but since we will have a lot of requests - 1 upload per day by a user - 100k requests a day, therefore we will use a queue
        1. checks - less than 60 seconds - appropriate or not (adult)
          1. we will send the user an ACK 202 and queue the request (events that will be subscribed to by the workers)
          2. convert to different formats 1. Divide the file into chunks depending on duration (1 chunk - 10 sec) 2. converting the chunks into different formats parallel processing (pushing it into workers) 3. These formats are then again processed to form different resolutions 4. we'll combine these files to result into 16 different files (started with 1) 5. Upload this file into different S3 regions
          3. Depending on the demographics of the user, we will replicate the files in the regions - S3+ CDN - Fault tolerant - if one availability zone/node is down, then because of replication, we can redirect the request to some other region
    6. Service 2 Tiktok (monolith)
      1. Serve the video - locate the user - redirect the request to the latest availability zone - has to take into consideration the network bandwidth of the user, the device of the user, the format supported by the user - + recommendation Caching will also be available at CDN CDN - example AKAMAI, OpenConnect (in Netflix) we want 99.99% availability, the company is providing 99.9% - tradeoff is cost, resources?
      2. Different formats? - Resolution, ios, android, desktop - 1 video - 200 devices, 4 formats -> 800 formats - additional 3 formats -> 2400 files for 1 video - Storage? - 100k uploaders, sporadic - 200k videos per day - 1mb per video - 200GB - multiple formats - 4 - ~ 600Gb - different formats (resolution) 2x - 600*2 = 1.2TB per day -
        1. we will send the user an ACK 202 and queue the request
    7. Video storage Database
      1. Video Data
        1. Amazon S3 High load Easily accessible Reliable AWS infrastructure Tie S3 to different CDNs to allow serving to different users in different regions - reduce network latency duplicate S3 buckets in different regions File storage - non-mutability if in case the use-case arises to allow users to edit the videos, then modification will be required
      2. User Data
        1. MySQL - recommendation system Asset properties - Join
      3. Video Metadata
        1. any key-value storage like Mongo Redis - not be very useful
          1. won't be as relational or organised as the user data would just be attached to the user and the video flexible - you can edit the video meta-data - delete the video, when was it uploaded, change the thumbnail, change the caption faster for access user history, recommendation will frequently access it horizontal scaling is easy