×

Data Science Garage's video: Data Versioning Control with Real ML Project Hands-On Lesson 1

@Data Versioning Control with Real ML Project | Hands-On Lesson #1
This Hands-on tutorial demonstrated how to use Data Versioning Control (DVC) commands for real Machine Learning (ML) project. DVC is an important part of (Machine Learning Operations). You will learn how to initialize a new DVC session, how to prepare your data for tracking with DCV, how to read .dvc files, to understand what the information is stored in these files, and more. Also, once you watched this tutorial, you will know how to pull data from remote storage to your project. Overall, this video shows the best practise how to work with DVC workflows for beginners and advances users (data scientists, data analytics, MLOps engineers). DOWNLOAD THE FILES TO START THE TUTORIAL: - You can fully follow the explained steps by yourself by cloning this Github repository to your local: https://github.com/vb100/dvc_project - Training and Validation data: https://s3.amazonaws.com/fast-ai-imageclas/imagenette2-160.tgz To complete this lesson, you will create a new branch on your Github repository where all data versioning control actions will be made. You should understand that while combining Git and DVC, small files goes to Git, and large files goes to DVC control. Each control has it's own components, such as Git staging area, DVC cache, DVC remote and more. The remote storage can be on the same computer (tutorial use-case) you are working on, or it can be in the cloud: - AWS S3 Bucket. - Google Cloud Bucket. - Azure Blob storage, etc. The content of the tutorial: 0:00 - Intro 1:17 - P1. Set-up your Python Environment 3:50 - P2. Hands-On the Basics DVC Workflow 6:50 - Tracking data files with DVC 9:27 - Uploading files to remote storage and push to DVC. 11:59 - Real life situation: Retrieve data from remote Importants moments: 4:44 - Create a remote storage folder (dvc_remote) and connect it to DVC system for the data science project. 6:00 - Check config file in .dvc folder. 7:37 - What are .dvc files? (Explanation). 8:00 - What is MD5 decryption in DVC (Explanation). 9:08 - Git Control vs. DVC Control (Schemes). 10:55 - Check remote storage folder. 11:36 - check .dvc folder and config file in Github repository. 12:20 - Use dvc checkout command to pull data from remote storage. Official DVC documentation: https://dvc.org/ Thank you for watching! Subscribe the channel to get more fresh similar content in future! See you there!

23

3
Data Science Garage
Subscribers
17.3K
Total Post
207
Total Views
217.5K
Avg. Views
2.2K
View Profile
This video was published on 2022-08-01 03:53:00 GMT by @Data-Science-Garage on Youtube. Data Science Garage has total 17.3K subscribers on Youtube and has a total of 207 video.This video has received 23 Likes which are lower than the average likes that Data Science Garage gets . @Data-Science-Garage receives an average views of 2.2K per video on Youtube.This video has received 3 comments which are lower than the average comments that Data Science Garage gets . Overall the views for this video was lower than the average for the profile.Data Science Garage #mlops #datascience #github has been used frequently in this Post.

Other post by @Data Science Garage