public:: true

  • LlamaCpp is an opensource project, with the goal of making LLM to run in consumer machines
    • One way to do this is by Quantitazation of the models
      • The idea is to replace the type of data in the model to another type that uses less space.
      • Normal models uses 32bits and we could quantitize them to use 16/8 or even 4 bits
      • Of course there is a loss of precission but we can load them in memory RAM