To solve this problem i found this useful solution, but discovered that it didn't cover all cases i had.
To start with,
\U does not mean every not-uppercase letter, but every character from the whole set that is not uppercase. So it includes spaces and everything else. This causes the expression to match " hello world!" if the sentence doesn't start at the beginning of the line, which is not quite what we want. To get every non-uppercase (that is lowercase) letter use
\l. But even that does not really work, because it means that now it matches "Hello world." as "ello world.", and we get a transformation as "HEllo world!". Again, not what we are looking for. Unfortunately, until someone can suggest a method to skip already capitalized sentences we have to stick to
Next, the expression only excludes periods, but not question-marks, exclamations or other sentence ending characters. We can extend this by simply including the respective characters:
[^\.?!:;]. We also do not need to enforce the terminating character, we can simply drop that. What we really want to match is the beginning of the sentence, we don't care about the end.
Also, unless the text is in all uppercase, lower-casing the second group could be counter productive as it would affect upper-cased acronyms etc. that are already there.
Lastly, we want to capture sentences spanning multiple lines, lest every line gets matched as a separate sentence. This is achieved using
(And inside a character set the
. doesn't need to be escaped.)
Putting it all together we get
Instead of matching a whole sentence we can also try searching for the end of the previous sentence: