Commit Graph

70 Commits

Author SHA1 Message Date
Justine Tunney
1a5ee11377 Restore old -std= flags
Getting rid of them fixed GA Ubuntu, but broke GA MacOS. Let's try a
different strategy.
2023-03-28 10:36:25 -07:00
Justine Tunney
1631298475 Remove -std=foo compiler flags
These flags are only really useful for linting. They put GCC and other
compilers into `__STRICT_ANSI__` mode. That can make systems stuff
slower, in favor of standards conformance, since it may cause headers to
remove platform specific goodness. It also makes builds more painful on
old distros that have the functions we need, but track an older version
of the standards where those functions weren't strictly available. One
such example is mkstemp(). It's available everywhere in practice, but GA
Ubuntu in strict ansi mode complains about it. If we don't use mkstemp()
then that'll put us on the security radar with other platforms.
2023-03-28 10:23:34 -07:00
Justine Tunney
cbddf4661b Get mmap() working with WIN32 MSVC
- We have pretty high quality POSIX polyfills now
- We no longer need to override malloc()

Tracked by issue #91
Improves upon #341
2023-03-28 10:10:02 -07:00
oKatanaaa
e4881686b4 Make WIN32 mmap() improvements (#341)
Still not fully working yet.

Closes #341
2023-03-28 09:19:03 -07:00
Justine Tunney
0b5448a3a4 Implement system polyfill for win32 / posix.1
I don't have access to Microsoft Visual Studio right now (aside from the
the Github Actions CI system) but I think this code should come close to
what we want in terms of polyfilling UNIX functionality.
2023-03-17 21:22:40 -07:00
Justine Tunney
5b8023d935 Implement prototype for instant mmap() loading
This change uses a custom malloc() implementation to transactionally
capture to a file dynamic memory created during the loading process.
That includes (1) the malloc() allocation for mem_buffer and (2) all
the C++ STL objects. On my $1000 personal computer, this change lets
me run ./main to generate a single token (-n 1) using the float16 7B
model (~12gb size) in one second. In order to do that, there's a one
time cost where a 13gb file needs to be generated. This change rocks
but it shouldn't be necessary to do something this heroic. We should
instead change the file format, so that tensors don't need reshaping
and realignment in order to be loaded.
2023-03-16 22:16:33 -07:00
Justine Tunney
2788f373be Get the build working 2023-03-15 03:14:20 -07:00
Ronsor
47857e564c Don't use vdotq_s32 if it's not available (#139)
* Don't use vdotq_s32 if it's not available

`dotprod` extensions aren't available on some ARM CPUs (e.g. Raspberry Pi 4), so check for them and only use them if they're available.

Reintroduces the code removed in 84d9015 if `__ARM_FEATURE_DOTPROD` isn't defined.

* Update ggml.c

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-14 21:34:37 +02:00
Radoslav Gerganov
60f819a2b1 Add section to README on how to run the project on Android (#130) 2023-03-14 15:30:08 +02:00
Georgi Gerganov
97ab2b2578 Add Misc section + update hot topics + minor fixes 2023-03-14 09:43:52 +02:00
Sebastián A
2f700a2738 Add windows to the CI (#98) 2023-03-13 22:29:10 +02:00
Georgi Gerganov
c09a9cfb06 CMake build in Release by default (#75) 2023-03-13 21:22:15 +02:00
Georgi Gerganov
7ec903d3c1 Update contribution section, hot topics, limitations, etc. 2023-03-13 19:21:51 +02:00
Georgi Gerganov
4497ad819c Print system information 2023-03-13 19:15:08 +02:00
Sebastián A
ed6849cc07 Initial support for CMake (#75) 2023-03-13 19:12:33 +02:00
Thomas Klausner
41be0a3b3d Add NetBSD support. (#90) 2023-03-13 18:40:54 +02:00
Pavol Rusnak
671d5cac15 Use fprintf for diagnostic output (#48)
keep printf only for printing model output

one can now use ./main ... 2>dev/null to suppress any diagnostic output
2023-03-13 18:39:56 +02:00
Georgi Gerganov
84d9015c4a Use vdotq_s32 to improve performance (#67)
* 10% performance boost on ARM

* Back to original change
2023-03-13 18:36:44 +02:00
uint256_t
63fd76fbb0 Reduce model loading time (#43)
* Use buffering

* Use vector

* Minor

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13 18:33:43 +02:00
Val Kharitonov
2a20f48efa Fix UTF-8 handling (including colors) (#79) 2023-03-13 18:24:18 +02:00
Pavol Rusnak
d1f224712d Add quantize script for batch quantization (#92)
* Add quantize script for batch quantization

* Indentation

* README for new quantize.sh

* Fix script name

* Fix file list on Mac OS

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-13 18:15:20 +02:00
Georgi Gerganov
1808ee0500 Add initial contribution guidelines 2023-03-13 09:42:26 +02:00
Matvey Soloviev
a169bb889c Gate signal support on being on a unixoid system. (#74) 2023-03-13 04:08:01 +01:00
Matvey Soloviev
460c482540 Fix token count accounting 2023-03-13 01:04:41 +01:00
Georgi Gerganov
c80e2a8f2a Revert "10% performance boost on ARM"
This reverts commit 113a9e83eb.

There are some reports for illegal instruction.
Moved this stuff to vdotq_s32 branch until resolve
2023-03-13 01:28:08 +02:00
Georgi Gerganov
54a0e66ea0 Check for vdotq_s32 availability 2023-03-13 01:21:03 +02:00
Georgi Gerganov
543c57e991 Ammend to previous commit - forgot to update non-QRDMX branch 2023-03-13 01:05:24 +02:00
Georgi Gerganov
113a9e83eb 10% performance boost on ARM 2023-03-13 00:56:10 +02:00
Matvey Soloviev
404fac0d62 Fix color getting reset before prompt output done (#65)
(cherry picked from commit 7eb2987619feee04c40eff69b604017d09919cb6)
2023-03-13 00:07:34 +02:00
Georgi Gerganov
1a0a74300f Update README.md 2023-03-12 23:39:01 +02:00
Matvey Soloviev
96ea727f47 Add interactive mode (#61)
* Initial work on interactive mode.

* Improve interactive mode. Make rev. prompt optional.

* Update README to explain interactive mode.

* Fix OS X build
2023-03-12 23:13:28 +02:00
Marc Köhlbrugge
9661954835 Fix typo in README (#45) 2023-03-12 22:30:08 +02:00
Ben Garney
f385f8dee8 Allow using prompt files (#59) 2023-03-12 22:28:36 +02:00
beiller
02f0c6fe7f Add back top_k (#56)
* Add back top_k

* Update utils.cpp

* Update utils.h

---------

Co-authored-by: Bill Hamilton <bill.hamilton@shopify.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12 22:23:15 +02:00
Sebastián A
eb062bb012 Windows fixes (#31)
* Apply fixes suggested to build on windows

Issue: https://github.com/ggerganov/llama.cpp/issues/22

* Remove unsupported VLAs

* MSVC: Remove features that are only available on MSVC C++20.

* Fix zero initialization of the other fields.

* Change the use of vector for stack allocations.
2023-03-12 22:15:00 +02:00
Georgi Gerganov
7027a97837 Update README.md 2023-03-12 22:09:26 +02:00
Georgi Gerganov
2d555e5b42 Add CI (#60) 2023-03-12 22:08:24 +02:00
Georgi Gerganov
7c9e54e55e Revert "weights_only" arg - this causing more trouble than help 2023-03-12 20:59:01 +02:00
Oleksandr Nikitin
b9bd1d0141 python/pytorch compat notes (#44) 2023-03-12 14:16:33 +02:00
beiller
129c7d1ea8 Add repetition penalty (#20)
* Adding repeat penalization

* Update utils.h

* Update utils.cpp

* Numeric fix

Should probably still scale by temp even if penalized

* Update comments, more proper application

I see that numbers can go negative so a fix from a referenced commit

* Minor formatting

---------

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
2023-03-12 11:27:42 +02:00
Georgi Gerganov
702fddf5c5 Clarify meaning of hacking 2023-03-12 09:03:25 +02:00
Georgi Gerganov
7d86e25bf6 README: add "Supported platforms" + update hot topics 2023-03-12 08:41:54 +02:00
deepdiffuser
a93120236f use weights_only in conversion script (#32)
this restricts malicious weights from executing arbitrary code by restricting the unpickler to only loading tensors, primitive types, and dictionaries
2023-03-12 08:36:35 +02:00
Pavol Rusnak
6a9a67f0be Add LICENSE (#21) 2023-03-12 08:36:03 +02:00
Georgi Gerganov
da1a4ff01f Update README.md 2023-03-12 01:26:32 +02:00
Juraj Bednar
6b2cb6302f Fix a typo in model name (#16) 2023-03-11 19:32:20 +02:00
Georgi Gerganov
4235e3d5b3 Update README.md 2023-03-11 18:10:18 +02:00
Georgi Gerganov
f1eaff4721 Add AVX2 support for x86 architectures thanks to @Const-me ! 2023-03-11 18:04:25 +02:00
Georgi Gerganov
a9e58529ea Fix un-initialized FP16 tables on x86 (#15, #2) 2023-03-11 17:40:14 +02:00
Georgi Gerganov
7d9ed7b25f Bump memory buffer 2023-03-11 12:45:01 +02:00