Del curso: Cloud-Based AI Solution Design Patterns

Desbloquea este curso con un periodo de prueba gratis

Únete hoy para acceder a más de 24.900 cursos impartidos por expertos del sector.

Distributed AI-model training

Distributed AI-model training

- To develop larger and more complex models often requires a massive amount of training data and a massive amount of compute power to process that training data. If we're limited to carrying out that type of model training effort on a single server or in an environment with limited infrastructure, then it can take a very long time, not to mention that we could be introducing various risks associated with overloading that infrastructure and compromising it's stability during the training process. The distributed AI model training pattern extends the standard model training process by utilizing a distributed training framework capable of sharing the training workload across multiple cloud-based servers or GPUs. There are two common approaches that can be taken, model parallelism and data parallelism. The model parallelism approach splits the model into model parts. Each model part is a subset of the whole model. Each model part is then deployed on a different server, or processor where…

Contenido