How to make a string index human readable
The problem
Assume you are writing a parser of some kind and it is important for you to detect errors in multiline texts(I am not being very creative here, since I actually faced the exact same problem :P). The problem is that your parser program detects errors in the form of total character index, say it says the problem starts in the 104th character in the text. How do you change it to something that makes more sense, say a line count(row count) and a char count(column count) i.e. 4th line, 10th character?
Before the solution, came the garbage
My first approach was to split the text in line seperators, make a list by
progrssively adding the line lengths (after taking into considerations the
length of the line seperator itself) then throwing functions from the
bisect
module to see what sticks! Unfortunately, I am not very good at
handling off-by-one subtleties and the whole thing collapsed at edge cases.
Then it hit me!
A correct solution can be achieved with really simple means. To get the line count, all you have to do is slice the string upto the specified index and count the number of line seperators in that substring. Partition the substring from right upto the right-most lineseperator; the length of the right partition(also known as tail) is the column count.
The solution
from os import linesep
def make_human_readable_index(text, index):
lineindex = text[:index].count(linesep)
charindex = len(text[:index].rpartition(linesep)[2])
# One is added to make zero based indices more in-line with
# common text editor line and character numbering.
return (lineindex + 1, charindex + 1)