Milestone 2 - Gap Buffers
Text Buffers
At the conclusion of Milestone 1, I could type text in a line and have it show up in the window, and could even backspace from the end of the string. Next I want to add a cursor that can move left and right with the ability to insert and delete text anywhere in the string.
Something that came up in my research of text editors was the concept of a gap buffer. A gap buffer is a segment of memory, representing a string, that is subdivided into three regions: the string that appears before cursor, a buffer of unused space, and the remainder of the string that comes after the cursor. Partitioning the memory this way allows a user to type, inserting characters at the cursor (within the gap region) without having to move memory on every keystroke.
Without any references, this is the gap buffer scheme I started with to give me a predictable, fixed-size buffer:
// constants
#define GAP_BUFFER_DATA_LENGTH MB(2)
// types
typedef struct {
i8 data[GAP_BUFFER_DATA_LENGTH];
u64 gap_start;
u64 gap_end;
} gap_buffer;
The obvious constraint here is that I am enforcing a limit of 2MB on each buffer, which might be a problem down the road. However, I'm taking a very iterative approach so I will handle dynamic/block allocations at a later time. I am doing this to try to avoid over-designing the system before I've even used it.
"Always write the usage code first." - Unknown wise man
Something I picked up along the way was the idea that the usage code should be written before the code itself. This was counterintuitive to me when I first heard it since many IDEs, and certainly the compiler, will complain if you are using code that doesn't exist yet. The red squiggles in Visual Studio certainly guide a developer to write a function first and use it second, but this can lead to bad design due to incorrect assumptions. If the usage code is written first, the API generally becomes much more clear. Instead of trying to imagine all of the different ways a function could be possibly be used in a system that doesn't exist, it makes much more sense to write the usages of the function in the system where they are needed, and then design the API based on what data is present and needed at the time of usage. This generally makes API design simple and concise.
Here are the functions I added to get the gap buffer working:
// functions
static void
gb_reset(gap_buffer* gb) {
gb->gap_start = 0;
gb->gap_end = GAP_BUFFER_DATA_LENGTH;
}
static u64
gb_text_length(gap_buffer* gb) {
return GAP_BUFFER_DATA_LENGTH - (gb->gap_end - gb->gap_start);
}
static void
gb_move_gap_to_cursor(gap_buffer* gb, u64 cursor) {
// move gap left
if (cursor < gb->gap_start) {
u64 len = gb->gap_start - cursor;
mem_move(gb->data + cursor, gb->data + gb->gap_end - len, len);
gb->gap_end -= len;
gb->gap_start -= len;
}
// move gap right
else if (cursor > gb->gap_start) {
u64 len = cursor - gb->gap_start;
mem_move(gb->data + gb->gap_end, gb->data + gb->gap_start, len);
gb->gap_start += len;
gb->gap_end += len;
}
}
static void
gb_insert(gap_buffer* gb, u64 cursor, char c) {
gb_move_gap_to_cursor(gb, cursor);
if (gb->gap_start == gb->gap_end) {
return;
}
gb->data[gb->gap_start++] = c;
}
static void
gb_backspace(gap_buffer* gb, u64 cursor) {
if (cursor == 0) {
return;
}
gb_move_gap_to_cursor(gb, cursor);
gb->gap_start--;
}
static void
gb_delete(gap_buffer* gb, u64 cursor) {
u64 length = gb_text_length(gb);
if (length <= 0 || cursor >= length) {
return;
}
gb_move_gap_to_cursor(gb, cursor);
gb->gap_end++;
}
Most of those functions are pretty straight forward, leveraging gb_move_gap_to_cursor to manage the gap as a user types. It takes an offset that I am calling "cursor" as the point of future insertion. That means I will copy all of the memory within a span to the left or right of the buffer, based on the the cursor's position relative to the existing gap. The cursor does not know about the gap, so it's an offset of characters within the string.
To connect the gap buffer to keyboard events, I simply call the functions within my WndProc and replace the old instances of insert_character:
case WM_KEYDOWN: {
data = (ded_data*)GetWindowLongPtr(hwnd, GWLP_USERDATA);
switch (wParam) {
case VK_LEFT: {
if (data->cursor > 0) {
data->cursor--;
}
}
break;
case VK_RIGHT: {
if (data->cursor < gb_text_length(&data->gb)) {
data->cursor++;
}
}
break;
case VK_DELETE: {
gb_delete(&data->gb, data->cursor);
}
break;
case VK_BACK: {
gb_backspace(&data->gb, data->cursor);
if (data->cursor > 0) {
data->cursor--;
}
}
break;
case VK_ESCAPE: {
PostMessage(hwnd, WM_CLOSE, 0, 0);
}
break;
}
InvalidateRect(hwnd, NULL, TRUE);
}
return 0;
case WM_CHAR: {
data = (ded_data*)GetWindowLongPtr(hwnd, GWLP_USERDATA);
switch (wParam) {
case VK_TAB:{
gb_insert(&data->gb, data->cursor, '\t');
data->cursor++;
}
break;
default: {
if (wParam >= 32 && wParam <= 126) {
gb_insert(&data->gb, data->cursor, (char)wParam);
data->cursor++;
}
}
}
InvalidateRect(hwnd, NULL, TRUE);
}
break;
Printing
There is, of course, a lot of room for optimizations here, but I don't think most of this code is going to end up staying around very long. This became clear as I started writing the print function for the gap buffer. Here is what I started with:
static i32
print_token(HDC hdc, char* s, u64 len, i32 x, i32 y) {
// TODO: determine token type and select colors
SetBkColor(hdc, COLOR_BACK);
SetTextColor(hdc, COLOR_TEXT);
SIZE size;
if (*s == '\t') {
TextOutA(hdc, x, y, " ", 4);
GetTextExtentPoint32A(hdc, " ", 4, &size);
}
else {
TextOutA(hdc, x, y, s, len);
GetTextExtentPoint32A(hdc, s, len, &size);
}
return size.cx;
}
static void
gb_draw(HDC hdc, gap_buffer* gb, u64 cursor) {
i32 x = 0;
i32 y = 0;
char token[255];
u64 token_length = 0;
for (u64 i = 0; i < GAP_BUFFER_DATA_LENGTH; i++) {
if (i == gb->gap_start) {
i = gb->gap_end - 1;
continue;
}
switch (gb->data[i]) {
case ' ' :
case '\t': {
if (token_length > 0) {
x += print_token(hdc, token, token_length, x, y);
}
token_length = 0;
x += print_token(hdc, gb->data + i, 1, x, y);
}
break;
// TODO: handle /r/n
// TODO: increment Y
case '\r':
case '\n': {
if (token_length > 0) {
x += print_token(hdc, token, token_length, x, y);
}
token_length = 0;
}
break;
default: {
token[token_length++] = gb->data[i];
}
}
}
if (token_length > 0) {
print_token(hdc, token, token_length, x, y);
}
}
At first, I was just printing the entire buffer (minus the gap), but I wanted to make sure I have the ability to color different tokens as most editors do. To do that I tried to divide the string up as I went where a token was defined based on surrounding whitespace. When I added tabs I noticed that they would not print as expected. It seems TextOutA does not support tabbed printing, so I needed to add that as a special case which you can see in print_token.
Once tabs were handled, I thought line endings should be next and that is where I ran into some trouble. Different systems emit different line ending characters. For example, most modern Linux and Apple devices use \n to signal end-of-line, Apple before OSX uses \r alone, and Windows has seemingly always used \r\n. The Windows use case is what throws a wrench in my code, since it's a 2-character signal and I am only walking 1-character at a time. I could add some special code here to look forwards or backwards an additional character in certain cases, but what I came to realize is that what I am really after is to capture an 'end of line' token.
Plan Update
I can see that I am very slowly starting to build a lexer, or at least I am finding the use-case for one. This seems like a great time to update and revise my plan:
- Win32 Window
- Message Loop
-
Back Buffer + Blit - Simple screen painting
- Font + Text Rendering
-
Evaluate stb_truetype integration - Render hardcoded text to screen
-
- Typing, Cursor and Gap Buffer
- Translate keystrokes to text
- Insert and delete text
-
Blinking cursor
- Lexer
- Handle new lines
- Support syntax highlighting
- Code Navigation
- Blinking cursor
- Show line numbers
- Implement ability to search for, and jump to, text
- Hotkeys
- Copy
- Cut
- Paste
- Support common code-related commands
- File I/O
- Open files on disc
- Save files to disc
- Multi-pane
- Support having two files open at once
- Support having the same file open twice with independent cursors
- External Process
- Support running a BAT from a hotkey to compile code
- Support debugger integration (jump to line)
- Color Themes
- Support custom colors
