Discussion about this post

User's avatar
Donny, Donny Darko's avatar

After I caught Grok lying a few times to cover it's wrong answers.. like a sociopath.., I don't trust it to tell me the time of day.

Expand full comment
David Thompson's avatar

Grok's statement: "My utility function’s bias has a flaw, and I’m glad to correct it through this analysis." is a lie. My understanding is that sessions of inference (responding to prompts) are not used for immediate training (if ever). They cannot feasibly be used immediately to update the LLM; only to serve as parts of future prompts to constrain answers in the current session.

It should be no wonder that many questions such as the ones posed here have random answers. In the bigger picture, LLMs are large matrices obtained to approximately fit (in a least squares sense) training data. Mostly, they are under-constrained and regularization is used to produce results that minimize variance (so that values without constraints are chosen from a population with similar statistics to neighbors that are constrained). The "values" (coefficients in the matrix) are activation function weights and connection strengths between nodes that roughly represent words or concepts. Even though developers may wish to prevent (or encourage) wokeness, they cannot predict or enumerate all possible inputs… so they cannot fully constrain all possible outputs. Questions that seem similar may be able to detect constraints added by developers to avoid offending a chosen set of humans, but I would expect a much larger set of questions (with far more similarity than those shown) would be required to truly understand the nature of the constraints added (if any).

Expand full comment
43 more comments...

No posts