Author Archives: ksbhaskar

AI for M Patterns

We thank Alex Woodhead for his first blog post, and hope there are many more to follow. If you would like to post on the YottaDB blog please contact us at info@yottadb.com.

TL;DR

Generating patterns for string pattern matching in the M language is expert friendly. An AI tool eases the task. There is a live demonstration where you can try it for yourself.

Using AI to Generate M Patterns

Programs need to determine whether a string matches the pattern for a specific type of data. For example:

Data Type Example
Email info@genput.com
Phone Number 213-101-0101
Date 06-07-2025

 

Pattern matching allows M (affectionately known as MUMPS) language code to determine whether a string matches a pattern, for example, whether a line of input contains a telephone number, an e-mail address, a date, etc. For example, 1"("3N1")"2(1"-"3N)1N matches a US telephone number in one common format, e.g., (123)-456-7890.

As the syntax of M patterns can be arcane, this blog post describes an AI generative pretrained transformer (GPT) for generating the code for M patterns from natural language input as well as sample datasets.

Patterns can be generated from natural language descriptions or from examples.

Patterns from Descriptions

A natural language description of a telephone number in the format (###)-###-#### where # is a digit could be:

module A
  one of character minus
  followed by three of numeric characters
the main pattern is as follows:
  one of character open-parenthesis
  followed by three of numeric characters
  followed by one of character close parenthesis
  followed by two of module A
  followed by one of numeric character

Natural language descriptions aim to be readable and understandable without technical training. Updating a textual description is more user friendly and less error prone than manual pattern match code. The technical demonstration uses generative AI to transform these text descriptions into M source code code, thus making existing pattern match code accessible for maintenance and support.

Patterns from Samples

Consider the following sample records:

Item Record
 1 (003)-615-2614
 2 (519)-523-0258
 3 (266)-885-4964
 4 (261)-274-3909
 5 (752)-129-3876
 6 (173)-514-2497
 7 (040)-511-8991
 8 (467)-715-3325
 9 (488)-269-8099
 10 (025)-705-8417

 

An author of pattern match code may follow a development process:

  1. Review the sample records for format features:
    • Numeric sequence length
    • Open and Closing Brackets
    • Hyphen delimiter
  2. Decide the characteristics of format rule elements to use.
    • A rule can be defined in different ways.
    • Which approach is elegant, optimal and maintainable?
  3. Implement Pattern Match code to encapsulate the format.
  4. Test the code against code samples.

What if AI could replace these steps and create pattern match code automatically from the same data samples? The technical demonstration replaces all manual steps 1 through 4 above with automatic code creation.

Finally this technical demonstration closes the development iteration loop for pattern match code by offering:

  • Generating natural language descriptions from existing pattern code.
  • Generating compliant samples and anti-samples from existing pattern code.

Building Models

In early model training cycles it was found that the features for description-to-pattern were incompatible with samples-to-pattern challenges. Hence development then proceeded to create two separate models, one for each task area.

Description-to-Pattern Model

A synthetic dataset was created consisting of sets of pattern match codes followed by their respective descriptions. An extensive balanced representation, and randomized variation of pattern features was needed to learn:

  • Repetition range, e.g., “One-To-Three” numeric characters
  • Structure
  • Optionality
  • Alphanumeric, Numeric, Punctuation
  • Literal strings

For model description text, multiple sample variations are introduced to anticipate flexibility for prompt input for example:

  • Quantity alias: “two” can be represented in similar words like “Double”, “Twice” and “Pair” or the value “2”
  • Character alias: “-” and the word “minus” can be used interchangeably.

By introducing description variation in named written natural languages, a single assimilated model can process prompts in multiple languages.

Samples-to-Pattern Model

Another synthetic dataset was created consisting of sets of sample values followed their respective pattern code. As before an extensive balanced representation, and randomized variation of pattern features was needed to learn:

  • Repetition range: “one-To-three” numeric characters
  • Structure
  • Optionality
  • Alphanumeric, Numeric, Punctuation
  • Literal strings

Additionally the dataset is balanced for:

    1. Generalization
      • Prefer generating exact patterns for small data samples
      • Prefer generating flexible patterns for samples with wide variation
    2. Delimiters, for example:
      • Character "-" in text "(003)-615-2614"
      • Character "^" in text "Smith^Bob^M^19340815"
    3. “Open-Close” pairs features for example:
      • "(" and ")" in text "(003)-615-2614"
      • "<" and ">"

To clarify and elaborate on how the term “generalization” is being used here: In model training there are the concepts overfitting and underfitting. If a model is over trained on a sample dataset it does not perform well for future tasks on new samples not in the original training dataset. There is a similar scenario for the completed PatternMatch model. The model needs to suggest from a small number of sample records, the likely useful pattern match expected. Some patterns can represent tens of thousands of variations of possible samples. Some patterns are very specific with only a few possible samples that all fit in the prompt supplied to the AI. The training data is deliberately curated to provide a wide and graduated range of exact to more generalized patterns. This appears to imbue a logical “pattern usefulness”.

To postulate how the model may achieve this, consider the next possible generated character when outputting software code. This has a constrained range of possibilities.

  • In samples where the association of the next token in sequence is weighted to relate to only a small number of possible tokens, this deep feature then suggests generating patterns with small number of possible samples.
  • In samples where the association of the next token in sequence is weighted to relate to a wider variety of possible tokens, this deep feature is cascading to prefer generating more general patterns for large possible samples.

The term generalization is wrapping the spectrum of this behavior. The process is not using a human chain-of-thought reasoning to achieve output, but a more fundamental set of self-taught features, impressed during training.

Samples are required to fit within the context window of the GPT. Randomized variation in the number of sample records within each training item, improves deduction capability at low sample numbers. Empirical benchmarking is used to evaluate performance for Generalization vs Delimiters style patterns.

Within the pipeline, two forms of pattern were tracked, where one is a simplified form. This endows generative behavior with a preference for shorter pattern forms, e.g., consider rule “two numeric followed by two to four numeric” can be more simply expressed as “four to six numeric”.

Training Effort

Environment: Nvidia Cuda A10 GPU on the Huggingface platform.

Description-to-Pattern Model

A full retrain is needed when incorporating each new language.

Stage Continuous GPU training
 New dataset  4 days

Samples-to-Pattern Model

Stage Continuous GPU training
 Prototype base dataset  4 days
 Main dataset  13 days
 Second refined dataset  2 days
 Third refined dataset  4 days

 

As the models are separated by task, it becomes convenient to add new language support to descriptions with a relatively quick turnaround.

Benchmarking

Complete success of a pattern match was defined as its ability to satisfy all of its respective candidate sample records, not just those that fit within a context window.

Overview Benchmark Report

Total benchmark tests used 3895
Mean success across all matches 91.75%
Complete pattern match success 81.98%
Partial pattern match success 15.74%
Unsuccessful match records 2.28%

 

The following table gives examples from benchmark candidates demonstrating partial success. It shows the percentage of sample records successfully matched to generated pattern code.

Item Sample Size Context Window Rows Matched % Match Actual Generated Pattern Pattern Template
1 31 31 28 90.3 4UN5AN2.3(4UNP2"/"1UP... 4UN5AN2(4PUN2"/"1PU...
2 31 31 17 54.8 5N5ANP3"7Æ6N1ŃS8"1(2... 5.6N5NPA3"7Æ6N1ŃS8"...
3 31 24 15 62.5 5.8"6Ã02"1.2LNP1.2(1"K"4... 5.8"6Ã02"1.2LNP1.2(1"K"4...
4 31 19 28 90.3 3P5UN5NP5UN5UN3AN3... 3P5NU5NP4UN5NU4NA...
5 31 10 7 22.6 4LNP3.6AN4"×_"5"¤īĩĵ®ü"... 4NPL2.5NA4"×_"5"¤īĩĵ®ü...
6 30 9 26 86.7 5.6"ŀ¬¦"5"!pp%"1UNP5"Ù6... 5.6"ŀ¬¦"5"!pp%"1PNU5"Ù...
7 25 12 5 20.0 2"AAHH"5"30"3(4.7UP,4L)... 2"AAHH"5"30"5(3UP,4L,4...
8 31 14 30 96.8 5LP4")))¦¦§÷"4.10LN5.10U... 5PL4")))¦¦§÷"4.11NL5.10P...

Column Explanation

  • Sample Size – The number records available for test sample. Sometimes a pattern describes less than 31 possible exact matches.
  • Context Window – Maximum number of sample records used by model to generate a new pattern.
  • Rows matched – This is the number of ALL sample records that were matched by the generated pattern. This may be larger than the actual context window.
  • Rows matched – Percentage of Rows Matched divided by Sample Size.

Items 4, 6 and 8 all achieve a match count greater than the context window. When pattern match code is generated from sample values the context window used and proportion of matching is returned and displayed as a comment in the technical demonstration.

Front-end Framework

The technical demonstration employs a Gradio web framework as the web display layer. This fits well with HuggingFace ecosystem and infrastructure choices. It provides both a browser displayable layer and a callable API. The callable API is used by benchmarking scripts to leverage cloud GPUs to effect faster report turnaround. The Gradio framework provides a low code user interface implementation approach, freeing more time to focused on domain specific training challenges. For example, in a Python virtual environment with CUDA available, one can translate from English to French, in just a few lines of code:

from transformers import pipeline
import gradio as gr

pipe = pipeline("translation", model="Helsinki-NLP/opus-mt-tc-big-en-fr",device="cuda")
demo = gr.Interface.from_pipeline(pipe)
demo.launch(server_port=8080)

This renders the following fully operational web interface for translation:

The following screenshot shows the more involved demo tool interface utilizing the same Gradio framework:

Legend:

Item Control Description
1 Pattern Text One or more lines of M code. The pattern can be extracted from the first line containing a pattern match expression.
2 Describe Text This holds a structured description corresponding to code code expression pattern.
3 Describe Pattern Button This button action transforms the pattern in code into a human readable structured description.
4 Pattern from Description This button action uses generative AI to translate the natural language description found in English, French, Spanish, or Portuguese into new pattern match code.
5 Matches List Sample values that match the code expression.
6 Non-Matches List Sample values that fail to match the code expresion.
7 Generate Values Button This button action extracts the pattern in code and generates the two lists of matching and non-matching values.
8 Pattern from Values Button This button uses generative AI to transform the values in the “Matches” column into a new pattern match code expression.
9 Validate Values Button Re-validates all the sample values in both the “Matches” and “Non-Matches” columns.

 

The “tick” and “cross” symbols in the “Good” column, confirm expected sample result for “Match” or “Non-Match” context. The use of “tick” and “cross” was chosen to communicate purpose in a mixed language user interface. The “gear” symbol on buttons is used to convey which parts of the user interface employ generative AI functionality.

Example of user workflow steps for the use-case to adjust an existing pattern match expression:

Step Description
1 Paste lines of code into “Pattern” text input.
2 Press “Describe Pattern” button.
3 Adjust or extend natural language description in chosen language (English, French, Spanish, Portuguese).
4 Press “Pattern from Description” button to generate new pattern code.
5 Press “Generate Values” button for new random testing values to explore and refine matching behavior.
6 Edit and add new samples in “Matching” and “Non-Matching” lists
7 Press “Validate Values” button to confirm adjusted sample values all pass as required.

 

Internationalization

In order to provide to runtime switchable language in the web interface, the community Python package gradio_i18n was employed. Label values are defined in a static dictionary keyed by language. The fiddliest part in demo was getting column titles updating when language changes.

For description training data, a different approach is needed that also incorporates tracking for:

  • Single or Plural quantities.
  • Gender, e.g., French has words “un” and “une” for number one.
English French
a table une table
a candlestick un bugeoir

 

In Spanish, the plural address differs.

English Spanish
the tables las mesas
the Cars los coches

 

For language specific error and informational messages, Python format strings are employed instead of composition. For example the info message placeholder: “info_X_of_Y_rows_used”:

Language Python Format String
English "{} of {} rows used."
French "{} de {} lignes sont utilisées"
Spanish "utilizan {} de {} filas"
Portuguese "{} de {} linhas são usadas"

Application Tricks

Retry and Context

When using a list of values to deduce a pattern, the order of the supplied values can influence the model success. To game this, there is a server-side retry-loop that shuffles the values in the context window. Attempting up to 5 times to find a fully matching pattern as validated against ALL of the sample values. It stops at the first full match.

When only a subset of sample values fit in the context window the returned pattern will include a description of the number of rows of samples used. This gives a hint and opportunity to promote specific rows into the context window for a better inference.

Generated patterns are validated against the whole list of patterns. Where this is a partial success pattern suggestion the number of sample rows that do match will be quantified and displayed in code comment.

Future Opportunities

The YottaDB and GT.M pattern match operator offera an extensibility mechanism. Similar to how pattern match code letters ANP mean Alphanumeric, Numeric and Punctuation respectively. Additional letters can encode context specific meanings. Should there be a commonly accepted usage of extensibility for a particular set of applications, it may be possible to tailor an additional trained model to accommodate these features.

About Alex Woodhead

An M programmer with almost a quarter century of experience, Alex Woodhead has worked on projects relating to clinical information system projects on multiple continents. Now living in Southern California, he is focused on synthetic data pipelines for generative AI for novel tools and automation, and is incubating genput.

Contact: info@genput.com

      1. Picture of Hunting Carpet made by Ghyath ud-Din Jami, Wool, cotton and silk, 1542–1543, Museo Poldi Pezzoli, Milan. This file has been identified as being free of known restrictions under copyright law, including all related and neighboring rights.
      2. Blog roll picture generated by OpenArt in response to a prompt by K.S. Bhaskar.
      3. Alan Turing quote from https://mathshistory.st-andrews.ac.uk/Biographies/Turing/quotations/
      4. Other graphics provided by the author.
YottaDB r2.02 Released

YottaDB r2.02 includes a number of features and enhancements that make YottaDB easier to use, and more like other Linux programs. For example:

  • With `ydbsh`, you can create shebang style scripts with M code.
  • SOCKET devices support TLS connections using server certifications that do not require a password, such as those issued by Let’s Encrypt.
  • Several optimizations to speed up Boolean operations.

In addition to enhancements and fixes made by YottaDB, r2.02 completes the merging of V7.0 GT.M versions into the YottaDB code base, GT.M V7.0-002, V7.0-003, V7.0-004, and V7.0-005, as described in the Release Notes.

Comparing YottaDB and Redis Using 3n+1 Sequences

TL;DR: Performance Comparisons has instructions for you build a Docker container that allows you to make a side by side comparison of Redis,[1] Xider™,[2] and YottaDB. The image above is a screenshot of Xider and YottaDB outperforming Redis with a 32-process workload.

In 2025, the Journal of Computing Languages (JCL) plans a special issue recognizing 30 years of Lua. Since YottaDB has a native Lua API, we have submitted an article to JCL for that issue. A preprint of that article is on arXiv. The article compares the Redis and YottaDB APIs, and delves deeper into the performance comparison described below.

The choice of workload is important when benchmarking databases.

  • A realistic benchmark must perform a large number of accesses, in order to remove timing jitter.
  • Accesses must be different types, as real world workloads are not monolithic in their databases accesses.
  • The benchmark must run on a variety of computing hardware.
  • The benchmark must give consistent results on repeated runs.
  • The workload should be simple and understandable.

Computing the lengths of 3n+1 sequences meets these requirements. Given an integer n, the next number in the sequence is (a) n÷2 if n is even, and (b) 3n+1 if n is odd. The Collatz Conjecture, one of the great unproven results of number theory, asserts that all such sequences end in 1, i.e., there are no loops, and no sequences of infinite length. For example:

  • 3 → 10 → 5 → 16 → 8 → 4 → 2 → 1
  • 13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

Note that the sequences starting with 3 and 13 both meet at 10. If there are two processes, one computing the lengths of 3n+1 sequences starting with 13, and the other those of sequences starting with 3, and if they use a database to store the results as they work, then the process which reaches 10 later can simply use the results of the earlier process which has already computed and stored the results for the rest of the sequence.

The picture[3] is a visualization of the 3n+1 sequences of the numbers through 20,000, illustrating the convergence of the sequences, with all eventually converging on 1.

The problem is simple enough for programs to be easily written in virtually any language. While our intent in our blog post Solving the 3n+1 Problem with YottaDB was to compare programming languages, the programs can also be used to compare databases, and database APIs. In particular, we can compare Redis, YottaDB, and Xider. Xider is an API layer that allows YottaDB to provide applications with a Redis-compatible API, with two options:

  • over TCP using the Redis Serialization Protocol (RESP); and
  • for databases residing on the same machine as the client, we offer a direct connection to the database engine using Python and Lua APIs. The API is the same as the TCP / RESP API, except that it calls the database engine using our in-process native API calls. 

In addition to the above two comparisons, which use the same program for Redis and Xider, there are also Lua, Python, and M programs that directly access the database, i.e., without going through a Redis compatible API layer.

We invite you to compare Redis, Xider, and YottaDB for yourself, and send us your comments.


[1] Redis is a registered trademark of Redis Ltd. Any rights therein are reserved to Redis Ltd. Any use by YottaDB is for referential purposes only and does not indicate any sponsorship, endorsement or affiliation between Redis and YottaDB.

[2] As Xider is still under development, not all Redis functionality is available. Any implemented functionality is upward compatible with Redis. Since Xider is continuously released, if functionality your application needs is not available, please check whether there is an Issue for it, and create one if there isn’t one. You can create a GitLab notification to stay informed with Xider updates.

[3] Picture generated using The Collatz Conjecture and Edmund Harriss’s visualisation tool.

Bid Wars

We thank Kurt Le Breton for his first blog post, and hope there are many more to follow. If you would like to post on the YottaDB blog please contact us at info@yottadb.com.

A New Hope: The Emergence of MUMPS and YottaDB

In a computer universe not so far far away…

It is a time of great innovation.

Revolutionary technology is emerging from the depths of computing history, where storage space and code efficiency were once the ultimate challenges.

In the midst of this revolution, a succinct though ancient language, born from the constraints of early computing, stands as a beacon of simplicity and robustness.

MUMPS emerges from its historic confines on a unique mission: to engage in an epic battle against a squadron of transactions, overcoming all conflicts with unparalleled grace.

As millions of processes vie for victory in a grand auction of galactic proportions, the force of YottaDB’s internal transaction management triumphs, showing how old design constraints can lead to new levels of resilience.

A Chatbot, a Guest Writer, and a Coding Journey

Dramatic? Maybe a little over the top. Corny? Yeah, of course, but what self-respecting sort of nerd would I be without some sci-fi references! I do hope to have piqued your curiosity though, and yes, there will be an epic battle of transactions before we conclude, so stay tuned.

Truth be told, I’m genuinely surprised to be writing this today. I didn’t expect a spur-of-the-moment email to the YottaDB team would result in a request from Bhaskar to be a guest writer. Seriously, I’m a coder, not a wordsmith, so I apologize for getting ChatGPT to help out with the introduction (though I must admit, I’m partly responsible for the corniness).

ChatGPT! Yes, this whole episode started with AI – it got me into this mess in the first place. I promise, though, this is not a story about AI but rather … well … let me rewind.

Back in the late 90’s when I landed my first programming job it was with a medical technology company in Australia writing hospital information systems and then shortly afterwards I was with another firm writing laboratory ones. You can probably guess by now that at the core of both these products was MUMPS. That was my first introduction to the technology.

Now, I do know that the ANSI Standard refers to the language as M, however since I come from the world of medical software, I’m allowed a little case of the MUMPS now and then. So in the spirit of Bhaskar’s presentations, I’ll be keeping those mustard stains on my tie. But I digress…

Since I was fresh out of college, super green, and a little naive, I asked our database engineer why the company didn’t use XYZ Brand relational database. The short answer: it just wouldn’t cut it. At the time, I felt this wasn’t much of an explanation, but I did become curious.

I’m not quite sure what sparked my interest, but the hierarchical structure really appealed to me. Even today, I can’t fully explain why, but I’m still captivated by MUMPS. Bitten by the bug so to speak … I know, bad joke … perhaps I will change that tie after all! Thinking on it now, I would describe M as unique: it’s a little rough around the edges and yet, I think, remains fascinating and compelling even today.

So where is this all leading, you ask? Specifically to an implementation detail in GT.M, and by extension YottaDB, that truly stands out. It’s a hidden gem, hinted at in M’s language semantics, but only fully realized through GT.M’s engineers. I’ll delve into details a bit later.

But first let’s jump back to ChatGPT and the very start of our story.

The Hidden Gem: YottaDB’s Conflict Resolution

I enjoy coding! I really do. Like most programmers, I have my fair share of “pet” projects to work on. As a long-time M geek, it’s no surprise that at least one of those is M-based. And with YottaDB being open-sourced, it was the natural choice, since hobbies rarely generate income.

But, as any developer with niche interests knows, it can get a bit lonely in the wee hours of the morning when there’s no one around to bounce ideas off. So, of course I ended up talking to a chatbot!

It was during one of these moments with ChatGPT, while weighing the pros and cons of a decision I was stuck on, that the conversation delved into the internals of YottaDB’s transaction handling, particularly its automatic conflict resolution and retry mechanism.

And herein lies the hidden gem I wish to explore with you all.

A Reality of Relational Databases

But before we dive in, please let me take another quick step back in time. I didn’t spend my entire career with M-based companies. Eventually, I moved on and found myself working in the more ubiquitous world of relational databases. As capable as these systems are, I always felt something was missing. For certain data needs, I knew a more elegant way to model it would be hierarchically. However, I was stuck normalizing everything into a trillion tables.

I really missed M, but now I was working in the relational world and had to code within the constraints of that architecture. Speaking of those constraints, let me state an obvious, but crucial, point: relational databases are self-contained products. Your data is separate from your application logic. Hence the database has no understanding of how other software uses that data; it simply responds to the requests it’s given.

It simply responds, until it can’t!

Write issues are generally rare for any system under light load. But, just occasionally, one process might commit its transaction before another can, resulting in a concurrency violation. The database doesn’t have enough context to manage this for you and so its responsibility rightly ends in maintaining data integrity.

All databases handle failed requests through various mechanisms, but ultimately, it’s your code that will face the brunt of those errors. The hot-potato is always thrown back to the application developer. It’s a significant amount of error handling to manage in addition to your primary code. The main point is, issues intensify under heavier and heavier load.

In the relational world, this is just an accepted reality.

YottaDB’s Approach: Seamless Conflict Handling

But with YottaDB, a different reality awaits you.

At the heart of a YottaDB based system is the integration of code execution and data access from within the same process. This tight coupling, required by the M language, lays the foundation for YottaDB’s robust resolution mechanism.

Building on this, YottaDB also employs an optimistic concurrency strategy, which operates on the assumption of success and is highly performant most of the time. When inevitable conflicts occur, and multiple processes attempt to update the same piece of data, YottaDB steps in like a police officer directing traffic. It allows one process to proceed, then the next, until everything is resolved.

How? Well, you see, YottaDB has enough context to automatically manage this for you. Which approach it takes depends on whether you’re using the C-API with modern languages, or writing traditional M code. For now let’s keep a chronological flow and first focus on M to explore how the language specifies transactional semantics.

How M Specifies Transactions

A transaction begins with the command:

tstart

Your logic can then modify both variables and database entries. When it’s ready to finalize the transaction, it issues:

tcommit

Now here’s when YottaDB’s hidden gem becomes apparent: when you use a modified tstart
and activate the automatic retry mechanism. On write conflicts, YottaDB presses its hidden rewind button: not only are database changes undone, but your application logic also rolls back to the tstart checkpoint. Depending on the transaction variant you use, some, none, or all local variables can be reverted to their checkpoint-time values. Once that’s done, your code runs again. Often, a retry will succeed with a tcommit, but in cases of persistent conflict, YottaDB dons its police officer cap to decide the order of resolution.

Please pause to ponder how cool this is.

Can you think of a strategy to retro-fit this ability into traditional databases? Neither can I.

Now, briefly, here are the variations that control how, if any, local variables are reset along with database changes. The upcoming code sample will showcase two of these.

  • tstart * – All local variables on the stack will be reset on every retry.
  • tstart () – No local variables will be reset at all.
  • tstart (a,b,c) – Only variables a, b, and c will be reset on each retry.

Why Developers Can Relax: YottaDB Has You Covered

So what’s left for the application developer to handle? Not much really. YottaDB manages conflict resolution for you, allowing you to focus on crafting simple business logic.

Its ability to handle conflicts gracefully is a core strength, ensuring robustness even under heavy load. While it might slow down during resolution, it happily works through your code to completion.

Reaching Out to the YottaDB Team

Thanks for sticking with me this far – your patience is about to pay off as our grand auction of galactic proportions starts soon.

But first, let me circle back to that AI conversation. I was surprised this unique capability wasn’t a bigger part of YottaDB’s marketing and remarked about that to ChatGPT. Its response piqued my interest enough that I decided to pass the insights along to the team.

In a world where developers will choose between NoSQL and relational databases, YottaDB’s ability to handle conflicts seamlessly is a huge differentiator – if only more people knew about it.

That’s why I reached out to the YottaDB team – I wanted to ask that this powerful feature got the attention it deserves. And as fate would have it, Bhaskar decided I was the right person to help spread the word. So here I am, not only showing why YottaDB is great but also sharing how a single email and a meddling AI can turn into a full-blown article!

Prepare for Battle: Bid Wars Begins

And now, the moment we’ve been waiting for has arrived. The epic showdown between a squadron of transactions as they battle for supremacy in a galactic-scale auction is about to commence. Buckle up, and let’s get our M code ready for this thrilling mission!

Breaking Down the Bid Wars Code

(Click on the image to access the code. Note: the code kills the ^Action global variable.)

BidWars.m program

The BidWars.m auction system simulates a highly concurrent bidding environment where many bidders compete to place bids on an item, in this case, our favourite astromech droid. The system launches a specified number of concurrent bidder processes, each trying to outbid the others by incrementing the auction price. This sample showcases YottaDB’s transaction mechanisms when handling contentious access and updates.

  • Bidders: The system starts by launching a configurable number of bidders, each represented as an independent process.
  • Concurrency: Once all bidders are launched, the auction begins, and each bidder attempts to become the leader by placing higher bids.
  • Bid Placement: Each bidder process checks whether it’s the current auction leader. If not, the process places a new bid, incrementing the auction price randomly by a small amount.
  • Average Time Calculation: Throughout the auction, the system calculates and logs the average time taken per bid.
  • Final Bid: Once the auction duration is complete, the system waits for the remaining bidders to finish and announces the final winning bid, along with key statistics like total bids received and bids per second.
  • Live Updates: As the auction progresses, the system provides real-time performance feedback on the current price, total bids, and average time per bid.

A Simpler, Cleaner Way to Handle Conflicts

The entire auction process demonstrates YottaDB’s capability to manage concurrent transactions and handle bidding conflicts gracefully, ensuring the auction runs smoothly despite a frenzy of bidders.

As you examine the source code, you might notice there’s no error handling. This isn’t an oversight but rather a deliberate choice so I could highlight this under-appreciated gem. From my perspective, it’s fascinating how YottaDB handles writes so seamlessly you don’t need to include error-handling code for them.

For those not familiar with M, I hope you find the code approachable and clear. The real value here is in YottaDB’s ability to manage conflicts internally, which simplifies development and keeps your code clean. While error handling remains important in other contexts, the fact that YottaDB takes care of conflicts automatically is a powerful feature that I believe deserves more attention.

YottaDB’s Legacy

In reflecting on the design and elegance of this product, it’s remarkable to consider its origins. Born out of the need to maximize efficiency and keep things small due to the constraints of early hardware, it emerged with simplicity and robustness. Despite the passage of half a century and the evolution of modern database systems, no contemporary solution has yet matched the seamless ease of use that YottaDB offers in the area of handling write conflicts.

The necessity of those early days has shaped YottaDB into something truly special. As I look at the current landscape of database technologies, it’s a testament to the brilliance of those early engineers that their effortless conflict resolution mechanism remains unmatched.

This legacy technology, forged from necessity, continues to stand out, reminding us that sometimes amazing innovations can hide in unexpected places.

About Kurt Le Breton

Kurt is a software engineer based in Australia. His career began in the medical software field, where he discovered his enthusiasm for hierarchical data storage. He later transitioned to developing specialized software for accounting workflows. Outside of programming, he enjoys photography, reading, and dabbling in bookbinding. Sitting around a campfire for an evening is one of the ways he likes to relax and take a break from the modern world.

  1. Photo of R2D2. Credit: Kristen DelValle, used under Creative Commons by 2.0
Dates and Times in Octo

Octo now supports dates and times.

While the ability to store and process dates and times is essential to many data processing applications, they are perhaps the least standard basic functionality across SQL implementations, as shown by the following table from SQL Date and Time.

 

Example

Format

SQL
Server

Oracle

MySQL

PostgreSQL

2022-04-22 10:34:23

YYYY-MM-DD
hh:mm:ss

DATETIME

TIMESTAMP

2022-04-22

YYYY-MM-DD

DATE

DATE

DATE

10:34:23

hh:mm:ss.nn

TIME

TIME

TIME

TIME

2022-04-22
10:34:23.55

YYYY-MM-DD
hh:mm:ss.nn

DATETIME

TIMESTAMP

2022

YYYY

YEAR

12-Jan-22

DD-MON-YY

TIMESTAMP

 

ISO 8601 is an international standard for dates and times that SQL implementations support.

Applications can have specialized needs for dates. For example, medical applications need to store imprecise dates, like “July 1978”, or just “1978” (for example, I know that my tonsils were removed in 1958, but I have no idea when in 1958). Fileman dates allow for storing dates with arbitrary levels of (im)precision. Specialized dates result in ad hoc implementations of dates when using SQL to access, e.g., as VARCHAR, INTEGER or NUMERIC types.

Octo provides several date and time types:

  • DATE
  • TIME [WITHOUT TIME ZONE]
  • TIMESTAMP [WITHOUT TIME ZONE]
  • TIME WITH TIME ZONE
  • TIMESTAMP WITH TIME ZONE.

[WITHOUT TIME ZONE] means that the text is optional.

Formats that Octo supports are:

  • TEXT (values such as ‘2023-01-01′ and ’01:01:01’)
  • HOROLOG (values in  $HOROLOG format)
  • ZHOROLOG (values in $ZHOROLOG format)
  • ZUT (integers interpreted as $ZUT values)
  • FILEMAN (numeric values of the form YYYMMDD.HHMMSS), where
    • YYY is year since 1700 with 000 not allowed
    • MM is month; two digits 01 through 12
    • DD is the day of the month; two digits 01 through 31
    • HH is the hour; two digits, 00 through 23
    • MM is the minute; two digits, 00 through 23
    • SS is second; two digits, 00 through 59

Here is an example of a query using Fileman dates against a VistA VeHU Docker image, which has simulated patient data for training purposes which you can run and experiment with.

OCTO> SELECT P.NAME AS PATIENT_NAME, P.PATIENT_ID as PATIENT_ID, P.DATE_OF_BIRTH, P.Age, P.WARD_LOCATION, PM.DATE_TIME as Admission_Date_Time,
      TOKEN(REPLACE(TOKEN(REPLACE(P.WARD_LOCATION,'WARD ',''),'-',1),'WARD ',''),' ',2) AS PCU,
      CONCAT(TOKEN(REPLACE(TOKEN(REPLACE(P.WARD_LOCATION,'WARD ',''),'-',2),'WARD ',''),' ',1),' ',TOKEN(P.WARD_LOCATION,'-',3)) AS UNIT,
      P.ROOM_BED as ROOM_BED, REPLACE(P.DIVISION,'VEHU ','') as FACILTY, P.SEX as SEX, P.CURRENT_ADMISSION as CURRENT_ADMISSION,
      P.CURRENT_MOVEMENT as CURRENT_MOVEMENT, PM.PATIENT_MOVEMENT_ID as Current_Patient_Movement,
      PM.TYPE_OF_MOVEMENT as Current_Movement_Type, AM.PATIENT_MOVEMENT_ID as Admission_Movement,
      AM.TYPE_OF_MOVEMENT as Admission_Type
FROM PATIENT P
LEFT JOIN patient_movement PM ON P.CURRENT_MOVEMENT=PM.PATIENT_MOVEMENT_ID
LEFT JOIN patient_movement AM ON P.CURRENT_ADMISSION=AM.PATIENT_MOVEMENT_ID
WHERE P.CURRENT_MOVEMENT IS NOT NULL
AND P.ward_location NOT LIKE 'ZZ%'
AND P.NAME NOT LIKE 'ZZ%'
AND PM.DATE_TIME > timestamp'2015-01-01 00:00:00' LIMIT 5;
patient_name|patient_id|date_of_birth|age|ward_location|admission_date_time|pcu|unit|room_bed|facilty|sex|current_admission|current_movement|current_patient_movement|current_movement_type|admission_movement|admission_type
ONEHUNDRED,PATIENT|100013|1935-04-07|89|ICU/CCU|2015-09-10 08:38:18|| |ICU-10|DIVISION|M|4686|4686|4686|1|4686|1
TWOHUNDREDSIXTEEN,PATIENT|100162|1935-04-07|89|7A GEN MED|2016-06-26 20:24:39|GEN| ||CBOC|M|4764|4764|4764|1|4764|1
ONEHUNDREDNINETYSIX,PATIENT|100296|1935-04-07|89|7A GEN MED|2015-09-25 11:56:03|GEN| ||CBOC|M|4711|4711|4711|1|4711|1
ZERO,INPATIENT|100709|1945-03-09|79|7A SURG|2015-04-04 13:38:10|SURG| |775-A|CBOC|M|4672|4672|4672|1|4672|1
EIGHT,INPATIENT|100716|1945-03-09|79|3E NORTH|2015-04-03 11:25:45|NORTH| |3E-100-1|CBOC|M|4636|4636|4636|1|4636|1
(5 rows)
OCTO>

If your application has specialized dates, we invite you to discuss your requirements with us, so that we can extend Octo to meet your needs.

  1. Photo of Mayan calendar stone fragment from the Classical Period (550-800 CE) at Ethnologiches Museum Berlin. Credit: José Luis Filpo Cabana, used under the Creative Commons Attribution-Share Alike 4.0 International license.
  2. Photo of one of NIST’s ytterbium lattice atomic clocks. NIST physicists combined two of these experimental clocks to make the world’s most stable single atomic clock. The image is a stacked composite of about 10 photos in which an index card was positioned in front of the lasers to reveal the laser beam paths. Credit: N. Phillips/NIST

 

YottaDB r2.00 released

YottaDB r2.00 is a major new release with substantial new functionality and database format enhancements.

  • Inherited from the upstream GT.M V7.0-000, YottaDB r2.00 creates database files of up to 16Gi blocks. For example, the maximum size of a database file with 4KiB blocks is 64TiB, which means you can use fewer regions for extremely large databases. With YottaDB r2.00, you can continue to use database files created by r1.x releases, except that the maximum size of a database file created with prior YottaDB releases remains unchanged.
  • For direct mode, as well as utility programs, YottaDB can optionally use GNU Readline, if it is installed on the system. This includes the ability to access and use command history from prior sessions.
  • Listening TCP sockets can be passed between processes.
  • The ydbinstall / ydbinstall.sh script has multiple enhancements.

In addition to enhancements and fixes made by YottaDB, r2.00 inherits numerous other enhancements and fixes from GT.M V7.0-000 and V7.0-001, all described in the Release Notes.

Graphical Monitoring of Statistics Shared by Processes

Quick Start

Monitor the shared database statistics of your existing applications in minutes, starting right now.

  • Ensure that node.js is installed.
  • Use the ydbinstall script to install the YottaDB GUI. (This also installs the YottaDB web server.)
  • Ensure that in the environment of each application process, the variable ydb_statshare is set to 1.
  • Start the YottaDB web server using the same global directory as your application.
  • Connect to the web server to start the YottaDB GUI. In the Dashboard, choose Database Administration / Monitor Database. Choose the data you want to monitor, and choose how you want it displayed. Click Start to see the data.

The GUI comes with a demo that includes a simulated application. This video walks you through using the demo.

Read on to dig deeper.

Motivation

Visual presentation is the most effective way most of us ingest complex data.Napoleon's disastrous march on Moscow As with the unique Minard depiction of Napoleon’s disastrous march on Moscow shown here, we routinely use graphs every day.

Processes accessing databases have tens of internal counters (collectively referred to as “statistics”) for each database file that they have open. The YottaDB GUI allows you to visually monitor these statistics in real time.

Statistics

There are two sources of statistics to monitor: statistics shared by processes and statistics in the database file header. Each has its uses.

Shared Statistics

YottaDB processes can opt to share their operational database statistics. If at process startup, the environment variable ydb_statshare is 1. Optionally, the environment variable ydb_statsdir can be set to a temporary directory for sharing and monitoring.

File Header Statistics

Statistics in the database file header capture the aggregate data of all processes accessing the database from the database creation. Viewing these statistics requires access to the database file but does not require application processes to share statistics.

Monitoring

Shared Statistics

Monitoring statistics shared by processes enables focused analysis, for example, visualization of current performance issues and the behavior of specific processes. The YottaDB GUI provides graphical monitoring of shared statistics.

With the intent of ensuring that it is intuitive to use, the GUI has integrated online documentation, including mouse-overs, but no separate user documentation other than installation instructions.

This video on this page walks you from start to finish, to graphically monitor statistics of an existing application on a remote server.

If you want to implement your own monitoring of shared statistics, YottaDB provides a %YGBLSTAT() utility program.

File Header Statistics

For production instances, we recommend continuously capturing statistics every minute or so. You can use the gvstat program program or write your own similar program. Capturing the data and creating baselines will help you in capacity planning as well as incident analysis. Continuously monitoring and displaying key parameters can additionally give you insight into the dynamic behavior of your application.

The guest blog post YottaDB Dashboard by Ram Sailopal demonstrates monitoring File Header Statistics with Grafana.

Security

Statistics are metadata, not data. While metadata should be shared advisedly, it does not typically have the same confidentiality restrictions as data. The GUI and web server follow normal YottaDB security policies.

  • Processes must opt-in to share statistics.
  • The web server process must be started by a userid on the system, and has no access capabilities beyond those of that userid.
  • Database monitoring can be performed by a read-only GUI. You can see from the videos that the GUI is operating in read-only mode.
    • While the GUI does have a read-write mode, for example, to support editing of global directories, database monitoring requires just read-only access.
  • To access statistics, the web server needs access to the global directory and the $ydb_statsdir directory if it is specified. You can use Linux permissions to control access.
    • The JavaScript libraries used are mature, versioned, and statically served from where the GUI is installed on the server.

Please Try It!

We invite you to use the GUI to monitor database statistics shared by your application processes and tell us what you think. As it is a new application, we are sure that it offers many opportunities for enhancement and improvement.

  • If you have YottaDB support, please reach out to us through your YottaDB support channel.
  • If you do not have YottaDB support, you can reach out to us:

Thank you for using the YottaDB GUI.

Making lua–yottadb Fast

Berwyn Hoyt

We thank Berwyn Hoyt for his first guest post on the YottaDB blog, and hope there are many more to follow. If you would like to post on the YottaDB blog please contact us at info@yottadb.com.

TLDR: YottaDB is a fast and clean database and it deserves a Lua API that is as fast as possible. This article discusses how we improved lua-yottadb to go ~4× as fast when looping through database records, and a stunning 47× as fast when creating Lua objects for database nodes, plus other improvements (results here). Low-hanging fruit aside, the biggest (and trickiest) improvement was caching the node’s subscript array in the Lua object that references a specific database node. Finally, porting to other language wrappers is discussed, as well as a tentative thought on how YDB might support an even faster API. Along the way we learned numerous things that might help someone port these efficiencies to other languages.


I had recently released v1.0 of lua-yottadb,1 a syntax upgrade to a tool that gives Lua easy access to YottaDB. Then in Oct 2022, I got my first user: Alain Descamps, of the University of Antwerp Library (sponsors of lua-yottadb), and a heavy user of YottaDB. An actual user. Brilliant!

But the euphoria didn’t last long. Mere hours later, I got Alain’s first benchmark: Lua was ¼ of the speed of YottDB’s native language (M) at simply traversing (counting) database records on our dev server. Even worse, when run on my local PC’s database, Lua was th the speed of M. Grrr… So I embarked on what I thought would be a 5-day pursuit of efficiency … but it took 25 days! Nevertheless, I did get the dev server’s Lua time down from 4× to 1.3× the duration of M. A very nice result.

What was the problem?

Alain’s Lua benchmark script was simple – akin to this:

ydb = require('yottadb')
gref1 = ydb.key('^BCAT')('lvd')('')
cnt = 0  for x in gref1:subscripts() do  cnt=cnt+1  end
print("total of " .. cnt .. " records")

If you’re new to YottaDB or M, to understand what’s going on, you need to know that YottaDB database nodes are represented by a series of ‘subscript’ strings, just like a file-system path. Whereas a path might be /root/var/log/file, a YottaDB node would be root("var","log","nodename") or root.var.log.nodename. Each node can hold a bit of data, further sub-nodes (similar to directories in a file-system), or both.

The program above simply loops through a sequence of 5 million database nodes with subscripts ^BCAT.lvd.<n> like so:

^BCAT("lvd",1)
^BCAT("lvd",2)
^BCAT("lvd",3) ...

To do so took 20 seconds in Lua, and 5 seconds in M. A quick code review of lua-yottadb found a number of low-hanging fruit in the gref1:subscripts() method above. The issues were that for every loop iteration, gref1:subscripts() did this:

  1. Checked each subscript in the node’s ‘path’ at the Lua level to make sure it was a valid string.
  2. Checked each subscript again at the C level.
  3. Converted the subscript array into a C array of string pointers (for calling the YottaDB API), then discarded the C array.

Yep. All that happened at every iteration.

Next, I built some benchmarks tests to track our improvement, then got stuck into improving things.2 Avoiding number (3) would require caching of the C array. But we could avoid (1) and maybe (2) by checking them just once at the start of the loop. This was fairly quickly done, with respectable improvements to iteration speed, including:

But what if we wanted to improve every single database operation, not just iteration? The benchmarks showed that there were two slow tasks critical to every single database operation: a) converting the subscript list to C, and b) creating new Lua nodes.

As an example of the latter, let’s create a Lua-database object:

guy = ydb.node('demographics').country.person[3]

We can do all kinds of database activities on that node, for example:

node:lock_incr()
node:set('Fred')
node:lock_decr()

or even create subnodes:

guy.gender:set('male')
guy.genetics.chromosome:set('X')

Each '.' above creates a new Lua subnode object (in this case, genetics and then chromosome) before you can finally set it to 'X'. You can imagine that a Lua programmer will be doing a lot of this, so it’s a critical task: we need to optimise node creation.

To achieve this we needed to find the ‘Holy Grail’: fast creation of cached C arrays. That would extend these benefits to every database function.

My early wins made it feel like I was about half way there. I’d told my employer the efficiency task would take 5 days. By this time I’d used up about half of that, and I thought I was on track. All that was left was to cache the subscript list. I mean, how hard can it be to cache something? You just have to store the C array when the node is first created, and use it again each time the node is accessed, right? Little did I know!

Caching subscripts: a surprisingly daunting task

Achieving these two goals together proved to be so difficult that it took me three rewrites.

At the outset, each node already held all its subscripts – but in Lua, rather than C. The node object looked like this:

node = {
    __varname = "demographics"
    __subsarray = {"country", "person", "3", "gender"}
}

Be aware that creating a table in Lua is slow compared to a C array: it requires a malloc(), linkage into the Lua garbage collector, a hash table, and creation of a numerical array portion. And here we need two of them (one for the node object itself, and another for the __subsarray). So it’s relatively slow. But at this point I didn’t know that this was the speed hog.

Iteration 1: An array of strings

In the first 2 iterations, I simply stored the C cache-array as a userdata field within the regular node object as follows (userdata is a Lua type that represents a chunk of memory allocated by Lua for a C program):

node.__cachearray = cachearray_create("demographics", "country", "person", "3", "gender")

Since Lua already referenced the strings in __subsarray (presented previously), my function cachearray_create() just had to allocate space for it: malloc(depth*string_pointers), and point them to the strings already existing in Lua. But this would mean I had to retain the __subsarray table to reference these strings and prevent Lua from garbage-collecting them while in use by C.

Although this caching would speed up node re-use, adding __cachearray to the node would actually slow down node creation time. To prevent the slow-down, I noticed that each child node repeats all its parent’s subscripts. So I saved both memory and time, by making each child node object contain only its own rightmost subscript __name and point to __parent for the rest:

node = {
    __varname = "demographics"
    __name = "gender"
    __parent = parent_node      -- in this case, "3"
}

This way I avoid having to create the whole Lua __subsarray table for each node. So each node contains a linked list to its __parents: gender -> 3 -> person -> country.

If you’re getting bored at this point, I suggest you skip to iteration 3.

Segfaults and Valgrind

That was the design for iteration 1. But the __parent made it complicated, because I had to create the cache-array by recursing backwards into all the node’s ancestors. This complexity hid a nasty segfault that I couldn’t find for a long time: the kind of C pointer bug where the symptom occurs nowhere near the cause. (I should have used valgrind myprog to help find it, but I hadn’t used Valgrind before, and I didn’t realise how dead simple it was to use. Later, I needed it again, and discovered that using it is true bliss.)

Anyway, the bug ended up being a case of playing “where’s Wally” except with hidden asterisks. The code was: malloc(n_subscripts * sizeof(ydb_buffer_t*)), except I shouldn’t have included the final * because the YDB API uses an array of buffer structs, not pointers to structs. In the end I found the bug by manually running Lua’s collectgarbage() – which often makes memory errors occur sooner rather than later.

Fast traversal: ‘mutable’ nodes

Finally, we had a node with cache – for fast access to the database. But we still didn’t have fast node traversal, like in Alain’s tests. This is because each time you iterate Alain’s for loop, you have to create a new node: ^BCAT(“lvd”,1) ^BCAT(“lvd”,2), etc. So I made the function cachearray_subst() re-use the same array, altering just the last subscript.

But changing a node’s subscripts is dodgy. It makes the same node object in Lua refer to a different database node than it used to. Imagine that you’re the programmer and have stored that Lua object for use later (e.g., when scanning through to find the highest value node: maxnode = thisnode).3 You’ll still be expecting the stored maxnode to point to the maximum node.

Enter the concept of a ‘mutable’ node, which the programmer explicitly expects to change. Lua iterators like pairs() can now return specifically mutable nodes. The programmer can convert this to an immutable node if they want to store it for use after the loop, or they can test for mutability using the ismutable() method.

Well, it worked. Now we have a lightning-fast iterator, and in most cases the programmer doesn’t have to worry about mutability.

Garbage collection & Lua versions

All that remained was to tell Lua’s garbage collector about my mallocs. In Lua 5.4 this would have been easy: just add node method __gc = cachearray_free to the object. But __gc doesn’t work in Lua 5.1 on tables (which is what our node object is), and we wanted lua-yottadb to support Lua 5.1 since LuaJIT is stuck on the Lua 5.1 interface – so some people still use Lua 5.1. Instead of simply setting __gc = cachearray_free, I had to allocate memory using Lua’s “full userdata” type, which is slower than malloc, but at least it provides memory that is managed by Lua’s garbage collector.4

Lastly, I wrote some unit tests, and I thought I’d be done. But node creation wasn’t really any faster.

Iteration 2: A shared array of strings

At this stage I’d already spent 12 days: over twice as long as I’d anticipated. That’s not too outrageous for a new concept design. Strictly, I should have told my employer that I was over-budget, so they could make the call on further development. But I was embarrassed that my node creation benchmark was not really faster than the original. We had fast iteration now, but I had anticipated that everything would be faster. Something was wrong, and I decided to just knuckle down and find it.

At this point I made a mistaken judgment-call that cost development time. I guessed (incorrectly) that the speed issues were because each node creation had to copy its parent’s array of string pointers. Instead of verifying my theory, I implemented a fix, adding complexity as follows.

Each child node retained a duplicate copy of the entire C array of string-pointer structs. But this seemed unnecessary since each child added only one subscript string at the end. Let’s keep just one copy of the C array and have each child node reference the same array but keep its specific depth as follows:

array = cachearray("demographics", "country", "person", "3", "gender")
root_node   = {__cachearray=array, __depth=1}
country_node= {__cachearray=array, __depth=2}
person_node = {__cachearray=array, __depth=3}
id_node     = {__cachearray=array, __depth=4}
gender_node = {__cachearray=array, __depth=5}

This works, but adds some complexity, because if you create alternate subscript paths like: root.person.3.male and then person.4.female. Then the code has to detect that the cache array is already used at subscript 3, so you can’t change it to 4, and you have to create a duplicate cache-array after all. It also complicates the Lua code because the C code now has to return a depth as well as the array.

Although it does speed up node creation, it’s still not as much as expected, because it’s also slowing down node creation simply by adding the __depth field to the object.5

Iteration 3: A breakthrough – the complete object in C

Up to this point I had been assuming I needed a Lua table to create a Lua object. After all, it seemed so efficient to make C just point to the existing Lua strings; and for that I needed Lua to reference those strings to keep them from being garbage collected: hence a Lua table.

But now I finally did some more benchmarking and showed Lua table creation to be the speed hog. Remember: it does a malloc(), links to the Lua garbage collector, creates a hash table, and a numerical array portion. Plus, we’re adding three hashed fields, which are not exactly instant: __parent, __cachearray, and __depth.

It sure would be much faster if we could store all this data inside a C struct. So I read the manual again and discovered that the userdata type can be a Lua object all by itself. I should have guessed this from the start. You can assign a metatable to a userdata – which means that you can give it object methods – which means it can actually be the node object, all by itself. No need to create a Lua table for a C object at all.

Implementing this, my userdata C struct now looks something like this:

typedef struct cachearray_t {
  int subsdata_alloc;   // size allocated for strings (last element)
  short depth_alloc;    // number of pre-allocated array items
  short depth_used;     // number of used items in array
  ydb_buffer_t subs_array[]; // struct reallocated if this exceeded
  char subsdata[];
} cachearray_t;

I pre-allocated space for extra slots (5, by default, before needing reallocation). Thus, when you create ydb.node(“demographics”) you can follow that up with .country.person[3].female and all these subsequent subscripts get stored in the same, previously allocated C-array.

Notice that this struct contains two expanding sections (it’s really two separate structs): the array of string pointers subs_array and the actual string characters subsdata. It would be better to keep these in a single array of structs, and thus have just one expanding section. But we cannot do that because we need an array of ydb_buffer_t to pass to the YDB API. These two expanding sections add complexity to the code, but don’t slow it down. It would be simpler to allocate two userdata sections: one for each section – but that would slow it down.

Also notice that since subsdata now stores subscript strings in my C userdata, I don’t need to keep a Lua table that references their Lua copies, which are now set free.

Anyway, this cache-array can now hold subscripts for several nodes. But I still need to store the depth of each particular node somewhere. For this, I have a ‘dereference’ struct which points to a cache-array and remembers the depth of this particular node.

typedef struct cachearray_dereferenced {
  struct cachearray_t *dereference; // merely points to a cachearray
  short depth; // number of items in this array
} cachearray_dereferenced;

For the root node, I store both the cache-array and this dereference struct in the same userdata. Child nodes only need the dereference struct.6 This dereferencing does add some complexity, but it’s worth it to avoid proliferating duplicate cache-arrays, which would fill up CPU cache and slow things down.

Finally, all subscript strings are cached all in C, and I only need to create a userdata for each node, not a table. The irony is that iteration1’s original motivation to re-use Lua strings was a false economy. It turns out that it’s just as fast to copy the strings into C as it is in Lua to do the necessary check that all subscripts are strings. And it doesn’t even waste any memory, because the Lua strings can then be garbage collected instead of held by reference.

By now, I’ve taken 25 days to implement this thing. I’m going to have some serious explaining to do to my employer. That, in fact, is how this article began.

Iteration 4: The gauntlet challenge – cheap node creation

Virtually instant subnode creation is possible if light userdata were used for it. However, these nodes could never be freed since __gc finalizer methods do not work on light userdata in Lua. Can anyone think of a workaround?

Consider a Lua object for database node demographics. Subnodes can be accessed using dot notation: demographics.country.person. Even with our latest design, subnodes still have the overhead of allocating a full userdata. But Lua has a cheaper type called a light userdata: which is nothing more than a C pointer, and free to create. We just need to pre-allocate space for several dereferenced subnodes (shown below) within the parent node’s userdata, and child nodes could simply point into it:

typedef struct cachearray_t {
    cachearray_dereferenced[5];
    <regular node contents> ...

This will finally make full use of that mistaken judgment-call I made early on, and re-use pre-allocation to ultimate effect.

But there’s a gotcha. Since a light userdata object has no storage, Lua doesn’t know what type of data it is, and therefore what metatable (i.e. object methods) to associate with it. So there’s a single global metatable for all light userdata objects. No matter: we can still hook the global metatable, and then double-check that it points to a cache-array, before running cache-array class methods on it. Should work fine.

Node creation time in lua-yottadb v2.1 is already 47× as fast as v1.2, but I’m anticipating this improvement will increase that to 200x, making dot notation virtually free. This will also keep all allocated memory together in one place: also better for CPU caching.

This hack would actually work … except for one problem: it can’t collect garbage. Tragically, Lua ignores the __gc method on light userdata. This means we’ll never be able to remove the light userdata’s reference to its root node. Which creates a memory leak. Here’s an example to explain:

x = ydb.node("root").subnode
x = nil

First the root node is created; then subnode references root; then Lua assigns subnode to x so that x now references subnode. Finally, x is deleted. The problem is that when x is garbage-collected, Lua does not collect light userdata subnode (which is still referencing root). So root is not collected: a memory leak.

Can any of my readers see a solution to this puzzle? I’m throwing down the gauntlet. Find a way to work around Lua’s lack of garbage collection on light userdata, then post it here, and I’ll make you famous on this blog. 😉

On the other hand, Mitchell has rightly pointed out that this is probably premature optimisation. After profiling, you can easily recover the speed in time-critical code by simply giving up dot notation and creating the node object all at once with ydb.node("demographics", "country", "person").

Portability: Python, etc.

In theory, my final cachearray.c is the best version to port to another language without a complete re-write because it now keeps its strings entirely in C – which means it’s fairly self-contained and portable. Having said that, it will need changes in how it receives function parameters: which is from the Lua stack. The Python/etc. portion of the wrapper will also need extensions to support cache-arrays.

A quick look at Python’s YDBPython, for example, shows that its C code has the same design as the original lua-yottadb – and is probably slow. Every time it accesses the database, it has to verify your subscript list, and copy each string to C. Unlike lua-yottadb, YDBPython also does an additional malloc() for each individual subscript string. Caching the subscript array could provide a significant speedup, just as it did for Lua.

The Python code to create an object also has the same low-hanging fruit as lua-yottadb. But YDBPython has an additional easy one-line win by using __slots__, a Python feature not available in Lua. A quick benchmark tells me that using __slots__ makes python bare object creation 20% faster (though it needs another 25% to be as fast as Lua’s object creation using userdata).

At this stage I do not know much about Python’s C API: neither about Python’s alternatives to userdata objects, nor whether Python has a faster way of implementing dot notation without creating intermediate nodes.

YDB API overhead: a suggestion

After all this work, why is M still faster? I suspect that lua-yottadb is now about as fast as it can get. So why is database traversal in Lua still 30% slower than M (on our server), when a basic for loop in Lua is 17× as fast as M? My hunch was that there are subscript conversion overheads on the YottaDB side of the YottaDB-C API, that M doesn’t incur since YottaDB includes both the YottaDB database and a complete implementation of the M language. I had a chat with Bhaskar, founder of YottaDB, and he was able to confirm my hunch.

Since YottaDB includes both the database engine and the M runtime system, it can make optimizations that another language runtime system cannot make when accessing the database. At a high level, that is why database access from non-M languages can get close to M, but cannot quite match it.

For example, since a value can have multiple representations, e.g., 1234 can be both a string as well as a number, with a cost to convert between representations, YottaDB stores metadata about data that includes the representations it already has. This is a classic trade-off of space for speed. This, and other optimizations, give M an edge over other languages when accessing YottaDB.

However, while database access specifically may be faster in M, performance should always be considered in the context of complete applications, any of which will execute a substantial amount of logic other than just database access. The fact that a basic for loop in Lua is so much faster than one in M may suggest that a complete application coded in Lua will outperform an equivalent application coded in M. So, my employer and I, as well as YottaDB, consider this exercise to be a success.

My work on this project suggests a possible avenue for an enhancement to YDB itself that may be worth considering. Given that languages like Lua and Python access data through node objects, one possibly way to improve performance would be for YDB itself to expose a function to cache a subscript array, storing it as mvals within YDB:

handle = ydb_cachearray("demographics", "country", "person", "3", "gender")

This would return a handle to that YDB-owned cache-array, which could then be supplied to subsequent YDB API calls as the root operating node, instead of the GLVN, varname. This would allow rapid access to the same database node, or rapid iteration through subnodes, without having to re-convert all 5 subscripts on every call.

Of course, this is just a theory. Profiling would first be needed to check whether this conversion process is the actual cause of the speed differential, and how much this solution would typically help.

Lessons

Perhaps most significantly, this article raises some of the significant issues that efficiency improvements in any language will have to work through. Hopefully, this will allow someone to implement it in iteration 1, rather than iteration 3.

Here are a few other take-homes from this experience:

  • Always test your theories about what’s causing the slow-down before implementing a complete fix.
  • Choose benchmarks relevant to your applications, because those will be what you optimise.
  • Communicate with your employer early, even if it’s embarrassing: even the reporting process might expose your assumptions. (I knew this already, but pride got in the way 😳).
  • Useful details about implementing a Lua library in C: speedy userdata, light userdata, and Valgrind for emergencies.

I sure did learn a lot through this process, and I hope you’ve learned something, too.

About Berwyn Hoyt


Berwyn Hoyt is a senior embedded systems engineer, having co-founded two startups in embedded systems. He has lived in the USA and New Zealand most of his life, but has recently moved to Sydney, to be with his wife who now works for a church there. Berwyn loves low-level, efficient software, and puzzles that are difficult. He especially loves ring-and-string puzzles, and has recently had a breakthrough on the extremely difficult Quatro puzzle (pictured), which has amused him, sporadically, for a decade. He has now solved the 3-ring version and is getting ready to tackle all 4 rings.

(All photos courtesy Berwyn Hoyt.)


1 Actually, the release was of lua-yottadb’s companion tool, MLua. The two are closely related, but work in opposite directions: lua-yottadb lets Lua access M, whereas MLua lets M access Lua.
2 Alain’s benchmark was skewed by a curiously slow build of Lua that he used. His sysadmin tells me it was compiled with -O0. He has now compiled it with -O3, which has demonstrably doubled the speed. In any case, the benchmark comparisons I’ve supplied are all from my laptop (the mainframe’s database setup is faster).
3 Worse, when we later introduce cache-array shared with parent nodes, if you create a sub-node out of it and store that, then your sub-node’s parent subscript will get changed, since it uses the same cache-array.
4 Incidentally, this should have been a clue to use the userdata type for the object itself instead of a Lua table, because __gc does work on userdata, in Lua 5.1. But I didn’t get there until iteration 3. At this stage, since Lua 5.1 userdata cannot reference Lua values, I was locked into thinking we still needed a Lua table to store Lua references to the subscript strings.
5 At this point I tried to store the __depth in C by using what Lua calls a ‘C closure’. What I didn’t realise is that although a normal Lua closure can have Lua locals for each instance of a function, a C closure is different: it can only store one set of locals for the entire C library. This didn’t let me store __depth against each node object at all. So that was a wasted attempt.
6 Child nodes also need to Lua-reference their parent to avoid it being garbage collected while they’re pointing to it. Ironically, iteration 3 thus required me to implement what I avoided in iteration 1: a way for Lua 5.1 userdata to reference Lua values.
YottaDB r1.38 Released

YottaDB r1.38 is a minor release that includes functionality needed at short notice by a customer.

  • A MUPIP REPLICATE option provides for a replication stream to include updates made by triggers on the source instance.
  • $ZPEEK() and ^%PEEKBYNAME() provide direct access to an additional process-private structure.
  • The -gui option of ydbinstall / ydbinstall.sh installs the YottaDB GUI.
  • Changes to subscripts and more meaningul SQL column names make %YDBJNLF more useful.

The %YDBJNLF utility program, which was released as field-test software in r1.36, is considered Supported for production use in r1.38. Also, the Supported level of Ubuntu on x86_64 moves up from 20.04 LTS to 22.04 LTS.

While all YottDB software is free to use under our free / open-source software licensing, r1.38 illustrates the value of being a YottaDB customer rather than a user: in addition to support with assured service levels, we will work with you to prioritize enhancements and fixes you need.

As with all YottaDB releases, there are a number of fixes and smaller enhancements, as described in the release notes.

We also invite you to try the new YottaDB GUI.

YottaDB r1.34 Released

While YottaDB r1.34 is an otherwise modest successor to r1.32, internal changes allow the popular programming language Python to be fully Supported. We are excited about making YottaDB available to the large Python user community. Here is a “Hello, World” Python program that updates the database and illustrates YottaDB’s Unicode support:

import yottadb

if __name__ == "__main__":
yottadb.set("^hello", ("Python",), value="नमस्ते दुनिया")

The Python wrapper can be installed with pip install yottadb. Full details of the API are in the Python wrapper user documentation. The current Debian Docker image at Docker Hub includes the Python wrapper. We thank Peter Goss (@gossrock) for his contributions to the Python wrapper.

Python joins C, Go, M, node.js, Perl, and Rust as languages with APIs to access YottaDB.

Owing to an internal change required to support the Python wrapper, application code written in Go and Rust will need to be compiled with new versions of the Go and Rust wrappers. We anticipate no regressions, and apologize for the inconvenience.

As discussed in our blog post Fuzz Testing YottaDB, adding a new type of testing exposed bugs previously neither encountered in our development environment nor reported by a user. Although fuzz testing generates syntactically correct but semantically questionable, contorted code that is unlikely to be part of any real application, the bugs are nevertheless defects. r1.34 includes the first tranche of fixes. As we are dedicating hardware to continuous fuzz testing, future YottaDB releases will include fixes for bugs found by fuzzing. We thank Zachary Minneker of Security Innovation for Fuzz Testing YottaDB and bringing its benefits to our attention.

In addition to fixes for issues, whether found by fuzz testing or otherwise, YottaDB r1.34 has enhancements that make it faster and more friendly, e.g.,

– Faster stringpool garbage collection, thanks to Alexander Sergeev and Konstantin Aristov (@littlecat).
– HOME and END keys work in direct mode for READ, thanks to Sergey Kamenev (@inetstar).
– Multiple improvements to ydbinstall / ydbinstall.sh.
– Enhancements to ydb_env_set to improve performance under some conditions and to be compatible with existing environments created without ydb_env_set.
– Enhancements to the %RSEL utility program.

YottaDB r1.34 also inherits enhancements and fixes from GT.M V6.3-011.

Details are in the release notes.