Index

Show enters and exits. Hide enters and exits.

00:00:01brixenyeah
00:00:12evanbut thats WHEN there is a seperator
00:00:17brixenyes
00:00:18evanthis is the split on characters path
00:00:28brixenthat's what I mean, doesn't make sense for ""
00:02:07brixenheh, the behavior when the limit parameter is 1 is even more wtf
00:02:21brixen"1,2,,3,4,,".split(',', 1) # => ["1,2,,3,4,,"]
00:02:51brixenI mean, seriously?
00:03:11brixensplit and give me at most 1 field == the whole string unsplit?
00:03:56brixenhaha, oh man
00:04:29brixenan code translater that converts stuff like Array(obj) for obj.kind_of? String to obj.split(",", 1)
00:05:05evanhehe
00:05:12evanyes, it's stupid.
00:05:22evanthere are a million dumb edge cases in String#split that should not be there.
00:05:32maharg? that's entirely sensible. It follows exactly from the meaning of that argument. You want one field, which leaves nothing to split. What would you expect?
00:05:55brixenmaharg: I would expect at least the behavior of passing 2
00:06:06brixenI already have the whole string
00:06:14brixenwhy do I want to ask to split it and get the whole string back
00:06:19brixenthat makes no sense
00:06:30evan1 is ok
00:06:33evanbut what about limit == 0
00:06:35evanor limit == -1
00:06:38evanwtf do those mean?
00:07:47mahargif I passed 1 in there and got out two elements to the array, I would not consider that expected behaviour
00:07:49brixenmaharg: really what I'd expect from obj.split(sep, n) is obj.split(sep).first(n)
00:08:03brixenmaharg: why split then?
00:08:09brixenyou already have a string
00:08:49mahargthe point of the N>0 argument is so you can have it stop splitting after N arguments. So you can have a CSV with 5 columns and not have to escape ,s in the last column
00:09:00brixenif sep is delimiting fields, and " If limit is a posi-
00:09:02brixentive number, at most that number of fields will be returned"
00:09:08brixenfrom pickaxe
00:09:12brixenI'd expect a field
00:09:18brixennot the whole string
00:09:24maharg"blah,blah,blah".split(",", 2) => ["blah", "blah,blah"]
00:09:45mahargwithout that argument, you can't get that behaviour. The behaviour you want you can get exactly as you showed
00:09:53mahargand it is useful behaviour
00:10:00brixenreally?
00:10:07brixenshow me a use for 1
00:11:08mahargergh. It doesn't matter that there isn't a very good use for 1, the output of 1 is just the obvious result of the behaviour of that argument. You're suggesting it be yet another stupid special case like -N is.
00:11:23binary42brixen: I see it more so as a case where a variable is passed in so you don't need an extra if somewhere.
00:11:50brixenbinary42: but you are asking to split into fields and limit the # of fields returned
00:11:53mahargit returning an array of 2 when you ask it for an array of 1 would be wrong
00:11:58brixenyou get no "fields" if you pass 1
00:12:19mahargI just can't see any way to agree with you that it should do anything else, even if it is pointless. It FOLLOWS
00:12:35brixenfollows what?
00:13:02brixenI'm splitting for fields, I have no reason to split if I just want the string
00:13:03binary42brixen: I'm not sure what you mean by fields.
00:13:20binary42It says there are 2 segments. CSV is just one use of split.
00:13:42brixenI'm not saying this is limited to CSV
00:13:50brixenwhy would I split a string?
00:13:56brixenthat's the crux of it
00:14:01brixenI'm calling a method
00:14:04brixenwhy would I do that?
00:14:07maharg"blah,blah,blah".split(",", 3) => ["blah","blah","blah"]; "blah,blah,blah".split(",", 2) => ["blah", "blah,blah"]; "blah,blah,blah".split(",", 1) => ["blah,blah,blah"]
00:14:41brixenmaharg: unconvinced
00:14:48evanANYWAY
00:14:56evan1 is hardly the weirdest part of String#split
00:15:01binary42brixen: I haven't used it so feel free to pretend it isn't there.
00:15:25brixenbinary42: unfortunately, I'm not free to pretend any of MRI weirdness is not there ;)
00:15:43brixenwere it that I could, this sunny day would be sunny
00:15:49binary42Oh I know. I was being sarcastic about the rant.
00:15:52brixenheh
00:16:08evanbrixen: just curious, would you rather 1 raise a ArgumentError then?
00:16:15brixenevan: nope
00:16:22brixenI'd expect the result of passing 2
00:16:30brixenI want at most 1 field
00:16:35evanthat would be weird.
00:16:36brixenI have no reason to split otherwise
00:16:43brixen*shrug*
00:16:44evanwe can agree to disagree on a non-issue :)
00:16:49brixenindeed
00:17:06brixenI'll just be sure to use obj.split("", 1) when I mean Array(obj)
00:17:12evanok!
00:17:23evani just want split(//) to work on utf8 strings :/
00:17:30evanthat btw, is what i'm trying to fix.
00:17:35brixenahh yes
00:17:59evani've aded String#find_character(offset) that respects kcode
00:18:01binary42brixen: So 0 is the whole string and the result is the tail? Just start counting from 1 instead.
00:18:22evanjust trying to figure out how to get it into split
00:18:35brixenbinary42: 0 is unlimited
00:19:10brixenthe fact that pickaxe goes to parenthetical special explanations for the case of 1 I will take as confirmation the behavior is weird
00:19:12binary42brixen: I know, just illustrating why 1 is good.
00:19:20brixenheh
00:19:22brixenanyway
00:19:28binary42[*a] is faster anyway... we all know how much evan likes that.
00:19:35evanyou know I do.
00:19:44evani have a [*a] pillow
00:19:48evana sleep on everynight
00:30:42mahargit's possible there could be a better meaning for the argument as a whole (should perhaps be the number of splits to perform, with 1 being the lowest possible), but arbitrarily renumbering 1 to 2 and leaving 2... as their current meanings is just adding an insane edge case imo. I'll leave it at that.
00:31:51brixen"leaving 2... as their current meanings" huh?
00:32:01brixenI'm just saying limit should be the limit of the fields
00:32:09brixenbecause I have no reason to split otherwise
00:33:05evantreating 1 like 2 is weirder
00:33:06evanimho
00:33:20mahargit's the limit on the number of times it splits (+1), not the limit on the number of fields it returns.
00:33:38brixenat 1 it splits 0 times
00:33:42brixenthat makes no sense
00:34:02brixenI'm calling a method but I want to specify it perform no action
00:34:03brixenhuh?
00:34:23evanhey! i got split(//) to work on utf8 characters!
00:34:27evancan we stop talking about split now?
00:34:28evan:D
00:34:28brixensweet!
00:38:42evani love specs that test 3*3*4*2 things in one it block!
00:38:45evan*eyeroll*
00:41:17brixenme too
00:41:47evanFUUUCK
00:41:59evanalso ones that loop over something AND DON'T USE IT
00:42:21brixenwhich spec?
00:42:39evanstring/split_spec.rb
00:42:41evanat the bottom
00:42:51evanit "taints...."
00:42:56evanthe limit isn't even used
00:43:22brixennice
00:44:46evanwait wait wait
00:44:47evanWTF
00:44:58evani've actually read this spec completely wrong
00:45:11evanthe 2nd should loop checks that a tainted Regexp DOESN'T taint the output
00:45:15evanARG
00:46:18evanway to go! add a completely an inversion check to an it block for the positive check!
00:46:18brixenwhich is so clearly described by "taints the resulting strings if self is tainted"
00:46:24evanjust to confuse the fuck out of me!
00:46:59brixenwelcomes evan to his painful alice-in-wonderland-ish world
00:47:17evani'm going to bring a moose back from Canada
00:47:23evanto threaten people with.
00:52:47evanwoop.
00:52:51evanpasses.
00:56:11evanhm, other spec failures...
01:05:58evanoh weird, ok, back on track!
01:06:45evanthere we goes!
01:08:03evanhm, need to fix the globals hooks
01:28:33evancontinues to struggle against $KCODE
02:16:00evanok
02:16:04evan2 more KCODE aware methods done
02:16:08evanthere are only like 3 more
02:16:17evangotta love that it's so spotty
02:16:18evanoh well.
02:26:54evanbrixen: in
02:27:02evanstring[/blah/] = "foo"
02:27:09evanwhat should I call "foo" in at it block?
02:27:10evanrhs?
04:05:47brixenevan: sure, rhs or 'value'
04:05:56brixenwhat's the whole description string?
18:20:49boyscoutAdd ByteArray#utf8_char and use it to implement String#unpack - 3f50099 - Evan Phoenix
18:20:50boyscoutAdd additional String#split spec - 79b85e6 - Evan Phoenix
18:20:50boyscoutFix String#split(//) via String#find_character - 83bba61 - Evan Phoenix
18:20:50boyscoutAdd spec for String#gsub + $KCODE - a9c2b16 - Evan Phoenix
18:20:50boyscoutAdd spec for String#scan + $KCODE - 0ffc7ea - Evan Phoenix
18:20:50boyscoutTeach String#gsub and String#scan about $KCODE - bc6cf71 - Evan Phoenix
18:20:51boyscoutAdd String#[]= with Regexp specs - b2c7ca4 - Evan Phoenix
18:20:51boyscoutFix typo - db94b58 - Evan Phoenix
18:20:52boyscoutAdd proper unicode support to Regexp - 72341f2 - Evan Phoenix
18:29:44boyscoutCI: rubinius: 72341f2 successful: 3037 files, 11914 examples, 36155 expectations, 0 failures, 0 errors
19:14:21kronos_vanoevan, why setting $KCODE='u' breaks String#inspect? Is it known issue?
19:14:35evani'm not aware of that
19:14:37evanexample?
19:15:07kronos_vano$KCODE='u'; "\xe3\x81\x82".inspect
19:15:35kronos_vanooutput should be "\"\\343\\201\\202\""
19:15:58kronos_vanobut I see only spaces and "
19:16:29kronos_vanoor just "\xe3\x81\x82" without inspect
19:17:32evanok, i have to head out
19:17:36evani'll take a look and fix that.
19:17:42kronos_vanok
19:19:07kronos_vanoNow in rubinius it works like 1.9
19:20:29evanactually
19:20:31evanworks fine for me
19:20:40evanI get the same result on rbx as 1.8
19:20:58evanwith $KCODE = "U"
19:20:59evani get
19:21:11evan>> str.inspect
19:21:11evan=> "\"あ\""
19:21:13evanon both
19:21:35evanone unicode character with quotes around it.
19:22:07kronos_vanoSomething wrong with my terminal because if I copy symbols from console to Xchat: I got: "あ"
19:22:30evanthats correct
19:22:45evanyou should get quotes around a single japanese character
19:23:38evanthats what I get on 1.8
19:23:42evanand rbx
19:23:55evanok, packing up the laptop.
19:23:57evanlater
22:09:11boyscoutupdate spec for expand_path(a, b): fixes for 'b' being a relative path - 4e1605c - Sylvain Joyeux
22:09:11boyscoutfix File.expand_path(relative_path, relative_path) - 6bff64f - Sylvain Joyeux
22:14:26boyscoutCI: rubinius: 6bff64f successful: 3037 files, 11914 examples, 36156 expectations, 0 failures, 0 errors
23:02:43kronos_vanoa simple optimization speeds up Array#inspect 2x: http://gist.github.com/308179
23:06:24BrianRice-workgood point; avoiding multiple string allocations
23:07:34Zoxcdoesn't it do more string allocations or is the diff inverted?
23:10:49kronos_vanoZoxc, no it isn't. multiply calling << is faster than creating another 1 big array and then joining its elements.
23:11:36Zoxcnot on MRI =P
23:12:11Zoxccreating an array rapes string concatenation
23:27:05mahargarray#join already optimizes that way, so it shouldn't really be that much of an improvement. In fact, you just made inspect look pretty much like a duplicate of join. Except without tainting.
23:29:26mahargwhich probably should do (and was getting 'free' from using join before)
23:31:12mahargthe interpolation syntax ("[#{blah}]") and tainging might have been what really gave you the speedup. It's possible join is too aggressive in tainting, as I'd assume adding something tainted to an array would taint the array but it's checking on every element
23:31:50mahargnope, guess not
23:33:48mahargactually, array#join should probably end up tainting in the out.append anyways, so maybe doesn't need it either?
23:42:21Zoxcmy guess is that Array#join doesn't do a single allocation
23:58:42mahargdon't need to guess, line 834 of kernel/common/array.rb :)