\<< [[§ DevOps for BI]] --- # TL;DR Using git online repositories can be very beneficial for BI projects in order to track changes otherwise untraceable in binary files such as .twbx or .pbix. This can become an issue however when those files grow large, for example, when using extracts or imports. A single branch git repository can help with the issue by tracking the metadata of the file in git while keeping the data portion synced by either online drives such as OneDrive or by network shared drives. --- # Description Using a version control system is usually beneficial for BI reports, even if you can't quite see changes due to the nature of the file extensions like .pbix or .twbx. When working with local data sources however (imports or extracts), those files contain not only report information but also data, which should not be included into your version control repository. For Tableau, one alternative is to work with published data sources and .twb files, excluding .hyper extensions from version control inside gitignore. For Power BI, something similar can be done when using the new format called PBIP, which separates the data (.abf) and the other properties of the semantic model, usually saved in some variation of .json files. Excluding data from version control and pushing changes as usual to a remote repository is part of the solution, but what about large data sets? Every time a refresh needs to be done, this can potentially take hours. If every developer accessing the model needs to do the refresh in his local machine to reflect changes, the time wasted is even longer. Just using the server and keep version control out of it could be a solution, however the drawback is that you lose track of changes and also need to download and upload the model every time (doing changes directly on the server can overload it for other users) Here are some alternatives in dealing with this situation. # Alternative 1 - OneDrive Sync with Share Point That would be the standard solution from Microsoft. Share Point sites can have document libraries containing .pbix files and those can be synced with One Drive folders in your computer. The sync time might still take a bit of time depending on the size of your .pbix files but at least this would avoid having to download models from the server all the time. One bonus perk of this solution is the ability to check-in and check-out files. If you are working in teams with more than one developer that can work with the same reports, locking files can help avoid version conflicts. One drawback is that it doesn't allow for proper CI/CD and lack feature branches and release versions like a standard git workflow. # Alternative 2 - Shared Drive with Single Branch Repo and LFS When working with shared drives, initializing git directly might lead to issues when not using bare repos (multiple people can switch branches in the shared folder at any given time). Apart from that, using an online repo and cloning it to a local drive in your computer creates redundancy and defeats the purpose of using a shared folder - changes in a semantic model would still need to be refreshed locally. If a data refresh takes a long time, this process needs to be done for each person that clones the repo instead of just one in a shared folder. An alternative to that is using a minimal architecture for your online repo: a single branch repository. This avoid the multiple refreshes since each semantic model is kept on the shared drive. The only difference is that changes can be committed to a branch and reversed when necessary, giving more transparency of changes in the model over time. # Alternative 3 - Shared Drive with Bare Repos Bare repos are repositories that contain only the version control information. They can be set up in a shared folder, removing the need of an online repository, but have similar issues like online repositories: datasets need to be refreshed locally for each person that clones the repo. This approach could provide a gitflow experience for a team of developers that keep main, dev and feature branches, but it doesn't provide much to reduce the time necessary to deal with refreshes for large datasets. --- # References - Demo for OneDrive & Sharepoint sync - https://www.youtube.com/watch?v=QsWab_Kr9yI