182:Remote Working 🏡

This week Thomas and Eric discuss how the entire world has decided to catch up with the technology of the times and start working remotely. Well that and a small pandemic to also help motivate companies. So what next? How are companies adjusting to remote life? The government is encouraging people to stay at home, work from home, and don't have any physical contact with other people, in short, a great time to be a developer.

We do hope everyone is staying safe and enjoy this episode of PHPUgly. 

Show Notes:

PHP Internals News: Episode 44: Write Once Properties

PHP Internals News: Episode 44: Write Once Properties

In this episode of "PHP Internals News" I chat with Máté Kocsis (Twitter, GitHub, LinkedIn) about the Write Once Properties RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news


Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 44. Today I'm talking with Máté Kocsis about an RFC that he produced called write only properties. Hello, Máté. How's it going?

Máté Kocsis 0:34

Yeah, fine. Thanks.

Derick Rethans 0:36

Would you mind introducing yourself a moment?

Máté Kocsis 0:38

My name is Máté Kocsis and I'm a software engineer at LogMeIn. I've been using PHP for 15 years now. And after having followed the mailing list for quite some time, I started contributing to the project last October, and now Write Once properties is my first RFC.

Derick Rethans 0:58

What is the concept of Write Once Properties?

Máté Kocsis 1:00

Write Once Properties can only be initialised, but not modified afterwards. So you can either define a default value for them, or assign them a value, but you can't modify them later. So any other attempts to modify, unset, increment, or decrements them, would cause an exception to be thrown. Basically, this RFC would bring Java's final properties, or C#'s, read only properties to PHP. However, contrary how these languages work, this RFC would allow lazy initialization. It means that these properties don't necessarily have to be initialised until the object construction ends, so you can do that later in the object's life cycle.

Derick Rethans 1:48

PHP already has constants, which are pretty much write only properties as long as they're being defined in a class definition. How does differ?

Máté Kocsis 1:58

Yeah, it's it's the difference because, so you can assign these properties value in the constructor or anywhere. You don't don't have to define them a default value.

Derick Rethans 2:12

Okay, and of course constants have the other problem is that you can only set its values to constants, not necessarily to any sort of expressions, or the result of other method calls.

Unknown Speaker 2:22

So you can use objects, resources, any kind of property value here.

Derick Rethans 2:28 You mentioned C#'s read only properties. And you sort of mentioned them in the same breath as write ones properties for PHP. These seem like opposite things

Máté Kocsis 2:39

Not quite opposite, but there's some distinction between the two. C sharp requires these properties to be initialised until the object construction ends. And this is very difficult to achieve in PHP. And now I'm using Nikita's words: Object construction is a fuzzy term and you can be sure if, if the contractor is involved at all. For example, if you are using Doctrine or proxy manager, so we decided to allow lazy initialization, which means that you don't have to assign these properties a value, you are free to do anytime when you want.

Derick Rethans 3:22

What happens if you read them without them having being set yet?

Máté Kocsis 3:27

Initially, when I started working on this proposal, I faced the problem because untyped properties have an implicit default value in the absence of an explicit default value. That's why you just can't really use them with the write once properties. Either you have a default value or you can do anything with them. That's why we we had to only allow typed properties with the write once properties and typed properties are in an uninitialised state by default. You can't read them until you first assign them a value.

Derick Rethans 4:04

Because in PHP 7.4 that will throw a type error. So that actually ties in really nicely with PHP 7.4's initialise concept for the type hinted properties.

Máté Kocsis 4:14


Derick Rethans 4:15

One thing that is slightly skipped over is which keyword does the RFC produced, because you mentioned final for Java and read only for C sharp, which one of you picked for PHP?

Máté Kocsis 4:25

So there were plenty of possibilities considered. The first one was the final keyword. At first, it seemed to be the obvious choice for me, but after thinking about it, I turned out that it's not not the right candidate because currently it affects inheritance rules in PHP. And now we are talking about mutability rules. We had sealed which comes from C sharp and the problem is the same because it also affects inheritance rules, so we shouldn't reuse it for different purposes. We also consider immutable. It's one I like. But it might be a little bit misleading because the usage of immutable data structures, like objects or resources are not restricted at all. Then there's locked, which is a bit too abstract or vague name.

We also have writeonce as well. And technically, it's the most accurate term. But from the user's point of view, it could be a bit confusing because they are not expected to write them at all, only the read these properties. And now we have readonly and probably this keyword get the most traction so far. And it's good. It's a good name because it refers to what users should generally do with these properties. However, there's also a slight problem that users can, or in some circumstances can, write these properties too. But that's not the general use case.

Derick Rethans 6:10

It's a curious thing. I remember we had a PHP developers meeting back in 2000, let's say 2008. But it could as well have been 2005, where we also actually spoke about read only properties, but I'm going to have to dig up the notes for that to see what it said there. Maybe you find it interesting to read to see what the history said about this.

Unknown Speaker 6:31

I'm curious. The question is open, so I plan to put it to vote.

Derick Rethans 6:37

When do you think you're putting it up for a vote?

Unknown Speaker 6:39

I think it should be close now. I will answer the mail, which came from Nicholas. I don't know if there is no more problems than we could do it this week or early next week.

Derick Rethans 6:54

As the properties are write once, how will she implement lazy loading with that? In order to do the lazy loading, you need to first figure out whether the property is already set. How will you know that it's already set? How can you check for that?

Máté Kocsis 7:07

I think generally you don't have to worry whether a property's write once or or not. Since mainly, we are talking about private or protected properties in the most cases. However, if you need this information, then you will be able to use reflection. I've already added support for method in in ReflectionProperty for this purpose.

Derick Rethans 7:31

Let me ask a little bit more about that. You mentioned that this is meant for lazy loading. I understand lazy loading is something that you do well, you're executing and all the methods. For example, on an object, you do get something and that needs to fetch things from a database. Because those write once properties are private or protected, most of the time, the code that fetches the things from the database that does the lazy loading still needs to know whether the properties already been written to. Because if it would attempt it again, you'd potentially get an exception. So how would it know it's already been written to?

Máté Kocsis 8:03

Good question. I was talking with with Marco Pivetta. His use case with proxy manager is to unset these properties in advance and then it can use the get or set or I don't know which magic methods.

Derick Rethans 8:28

I saw that the RFC mentioned a few other alternative approaches for this feature. And the headlines in the RFC say: read only semantics, write before construction semantics, and property accessors. Would you mind explaining these and why they haven't made the final RFC?

Máté Kocsis 8:44

The first one was to follow Java and C sharp, and require all write once properties to be initialised until the object construction ends. And this is what we talked about before. The counter arguments were that it's not easy to implement in PHP. This approach is unnecessarily strict. The other possibility is to let our limited writes to these properties until object construction ends and then do not allow any writes. But positive effect of this solution is that it plays well with bigger class hierarchies, where possibly multiple constructors are involved, but it still has the same problems as the previous approach. Finally, the property accessors could be an alternative to write once properties, although in my opinion, these two features are not really related to each other. But some say that property accessors could alone prevent some unintended changes from the outside and they say that maybe it might be enough. I don't share this sentiment. So in my opinion, unintended changes can come from the inside, so from the private or protected scope. And it's really easy to circumvent visibility rules in PHP. There are quite some possibilities. That's why it's a good way to protect our invariants.

Derick Rethans 10:15

What was the most criticism you got on the mailing list about his proposal?

Máté Kocsis 10:18

As far as I remember, the property accessor. The biggest criticism was that we don't really need this term, but we could use property accessors.

Derick Rethans 10:29

We have spoken a little bit about what this feature is. We went into a few use cases with lazy loading. What would other use cases for this be?

Máté Kocsis 10:38

I think it's really suitable for domain driven design, or working with value objects, and I'm a great fan of DDD. The problem is PHP can't guarantee any immutability for our objects. Just one example. You can invoke the object constructor as many times as you wish, which overrides all your properties.

Derick Rethans 11:04

I had not thought about that you can actually call the constructor yourself. And of course you can.

Máté Kocsis 11:08

Yes, me neither. I just saw somewhere probably in a previous discussion about immutable objects. That's the advantage of having write once properties. You could by using write once properties, yeah, you can prevent accidental modifications from the outside or from the inside too. And that's the main purpose.

Derick Rethans 11:32

Your main purpose wasn't lazy loading but more immutable value objects.

Máté Kocsis 11:36

Yes, yes. Right. I proposed right fans properties first, to pave the road for immutable objects because this is my main goal.

Derick Rethans 11:46

Okay, but you're going step by step. I think that's actually a wise way and Nikita have said something similar that it is nicer to take things little by little so that it is easier to convince people that this is a good feature or not.

Unknown Speaker 11:59

Actually it was Nikita's idea to split the two proposals.

Derick Rethans 12:03

That make sense. Okay, Máté, thank you for taking the time this morning to talk to me.

Máté Kocsis 12:08

Thank you for having me.

Derick Rethans 12:11

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.

181:Heavily Medicated 💊

This week Eric, John, and Thomas talk about the Coronavirus and how it's impacting the tech world. Letsencrypt revoking tons of certs. And Laravel 7 hits the streets and Eric is already coding with it. 

PHP Internals News: Episode 43: Syntax Tweaks

PHP Internals News: Episode 43: Syntax Tweaks

In this episode of "PHP Internals News" I chat with Nikita Popov (Twitter, GitHub, Website) about the RFCs. One on abstract methods in traits, and one about an improvement to the tokenizer.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news


Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 43. Today I'm talking with Nikita Popov yet again about a few RFCs that he's produced for PHP 8. Good morning, Nikita. How are you doing?

Nikita 0:34

Good morning, Derick. I'm doing great.

Derick Rethans 0:37

I've given up on introducing you because we've done this so many times. Now, you don't need an introduction any more. The first RFC I wanted to talk about a little bit this morning is the abstract trait methods validation RFC. What are traits?

Nikita 0:51

We usually talk about traits as compiler assisted copy and paste. Basically, we just take all the methods and properties from a trait and copy them into the class that's using the trait. That's a bit over simplified, in particular, you can use multiple traits in the single class. And those traits might be defining the same method, in which case you have to resolve the conflict in some way. So that's where you have these insteadof or use annotations to specify precedents and aliases.

Derick Rethans 1:23

Traits has been in PHP for quite a long time. What is now the problem that you're trying to solve through this RFC?

Nikita 1:29

The problem is that traits are sometimes not self contained. So to give a specific example, we have in the logger PSR, we have a trait called logger trait, which has a bunch of methods like warning, error, info, notice, and so on. So just simple helper methods, which all called the log method with a specific log level and this trait only specified these helper methods but still requires the actual class to implement the log method. The way you'll usually indicate that is by adding an abstract method to the trait. You have all the methods you actually want to provide by the trait. And you have a number of abstract methods that the trait itself requires to work. This already works fine, but the problem is just that these methods are not actually validated, or they are only inconsistently validated. Even though the trait specifies this abstract methods, you could implement it in the class with a completely different signature.

Derick Rethans 2:30

Okay, just like any signature?

Nikita 2:32

Just like any signature right. The method still has to be present in some way. But the signature can be completely different. Could also be like different method type, like a static method, or an instance method.

Derick Rethans 2:43

Just basically checks for the name is what you're saying?

Nikita 2:46

Yeah, it only checks with the name.

Derick Rethans 2:49

Is this the only place, is this the only time where these abstract methods are not being validated. Or are there other situations where that could happen as well?

Nikita 2:57

No, I think this is the only place.

Derick Rethans 3:00

Are all the situations where these abstract methods in the trait will get validated. And also on signature?

Nikita 3:07

As I mentioned, it's not like the signatures are completely unvalidated. They are just inconsistently validated. It depends a lot on exactly how you use the trait. If you just use the trait and specify the methods of the same class, it doesn't get validated right now. If instead of the method is provided by the parent class, so it's inherited, then it does get validated. If you don't implement the method that makes the class abstract instead, then it's also going to get validated in the child class. It kind of already works halfway. And this RFC just tries to make it work always.

Derick Rethans 3:44

Okay, that seems like a reasonably good addition to almost a no brainer.

Nikita 3:48

I would say it's basically, a bug. Especially if you look at the implementation, there is clearly some validation code there. The conditions are just a little bit off, but so we do have to go through the proposal, because this is a backwards compatibility break.

Derick Rethans 4:02

Yes, I was about to ask if it's a bug fix, why bother with an RFC? But if it's a BC break then yeah, we still need to do it of course. I doubt there be many controversies about is?

Nikita 4:12

Actually there is one contentious point. Um, so something I didn't mention yet is that the RFC also allows you to define private abstract methods in traits. Normally private abstract is like a contradiction in terms because private means only visible in the same class. And abstract means it has to be implemented in the child class, you can't really have both. You can't have both with traits, because traits can see the private members in the class. I think that by itself is like not controversial. That's a reasonable thing to have a trait. The part that is controversial is what you do with existing visibility modifiers. This pattern already exists. So people already define abstract methods in traits but because right now private abstract is forbidden, the lowest they can use is actually protected abstract, even though they don't actually want that method to be publicly exposed, or even protectively exposed. So there is an argument there that we should maybe ignore the normal visibility validation that we do, and allow even implementing a protected abstract method from a trait with a private method inside the class, simply for backwards compatibility reasons.

Derick Rethans 5:21

Because if you wouldn't allow that then, how would it break things?

Nikita 5:26

It would break things because there is existing code, using these abstract protected methods simply because we don't support abstract private yet. So those code would start throwing visibility error, and I mean, could be fixed by just dropping the abstract method, but there's also not ideal.

Derick Rethans 5:45

Because people use it to make sure that, I mean it's there in the class that implements the trait pretty much. Do you have any idea when this is going to for vote?

Nikita 5:53

I think it can already go up for vote? Mainly I need to resolve that question about the visibility first.

Derick Rethans 5:59

I'm looking forward to seeing that showing up sometime soon then.

How do you call your second RFC?

Nikita 6:05

Object based token get alternative?

Derick Rethans 6:07

I think that's a great title. There's a few words in there that we might have to explain first. What are these tokens you're talking about?

Nikita 6:14

So the token_get_all function, which we already have, exposes a part of the PHP compiler infrastructure. PHP compilation generally has three steps. The first is the tokenization. The second part is the parser, and then the compiler. So the tokenizer converts the raw character stream into tokens, which encode higher level concepts, for example, that like the sequence of FUNC and so on is actually a function keyword, or that double code followed by characters is actually a string. So this part only recognises like not larger structures, like whole functions but at least the the atoms that make up language.

Derick Rethans 7:00

Would you say these are the words that make up the sentences?

Nikita 7:03

Yeah, that's that's the right analogy.

Derick Rethans 7:06

Why would you want to have access to them?

Nikita 7:08

For example, I have a PHP parser library, which converts these tokens into an actual syntax tree. And then on top of that, you can easily analyse PHP source code. So this is what all these static analyzers, like PHPStan or Psalm are based on.

Derick Rethans 7:27

Do they all use the tokens?

Nikita 7:29

Those two, in particular, use my PHP parser library, and that one uses the tokens internally. There is also other tooling that's more directly based on tokens, for example, code formatters or code style inspection tools like PHPCS. Those all directly operate on the tokens instead.

Derick Rethans 7:47

But as you say, these tokens only are words and they don't really provide a structure. How would these tools then convert that into a structure?

Nikita 7:54

If you're looking for, if you're looking just at formatting, then you may not really need a lot of structure. So you probably do need to write like that of extra code to recognise that, okay, the function token followed by white space, followed by an identifier, that's function declaration. For the more complicated tooling that builds a syntax tree, you need to implement a parser, either based in code generation, or based on recursive descent approach.

Derick Rethans 8:26

Why would you not want to have direct access to PHP's AST instead because that already provides a structure for you?

Nikita 8:33

We do have direct access to the AST through the AST PECL extension, which is not part of core yet. I don't know if there are plans in that direction.

Derick Rethans 8:43

Well you wrote it so you surely can make these plans.

Nikita 8:46

Yes, I can make them but I don't know if I should make them.

Derick Rethans 8:50

I think you should.

Nikita 8:51

I mean, the nominal advantage of the AST extension is that it's always up to date with PHP. In practice that really isn't an issue, because some of the userland tooling is also pretty quickly updated. The more practical advantage is that the extension is a lot faster than implementing this in userland code. Well, I mean, this is really one of the areas where C code is faster than PHP code. The AST extension only exposes the structure that PHP itself needs. PHP is not interested in like precise formatting, and things like that at all. So it throws away quite a few things. You can, for example, get accurate on position information. Like, where, exactly not just which line but of which column, something is defined. And that's something you're quite often interested in.

Derick Rethans 9:46

Also, from what I've known, it throws away all the comments unless they are doc bloc comments. How does the tokenizer currently return information about the tokens? I've played with this in the past and I didn't think it was the prettiest format to get back out of it.

Nikita 10:02

token_get_all returns an array of tokens. And there are generally two types of tokens. One is single character tokens, like a semicolon, or a comma, or whatever, which are just returned as a string. So it's a single character string. And then there are complex tokens, like the function keyword, like white space, like strings, which are returned as an array where the first element is the token ID, which is an integer. And we have constants defined for these integers. The second element is the actual string content of the token. So for the function keyword, that's always going to be function, but it could be written in different ways because the keyword is case insensitive, so it could be all lowercase, or uppercase, hopefully it's all lowercase.

Derick Rethans 10:52

You'll get the odd situation where the first letter is the capital, I suppose, but that's about it, hopefully.

Nikita 10:57

And finally, the last element is the line number. So the starting line number.

Derick Rethans 11:02

So if you want to look at the position on the line, you'd have to calculate it yourself?

Nikita 11:08

Right you would have to track that yourself. I mean, there are two problems. One is just that you have these single character tokens and the complex tokens using different structure. So all the codes using them as to always switch back between those; check if it's an array or a string, or a test to do some kind of normalisation itself. And the second problem is that arrays in PHP are fairly memory inefficient when it comes to storing a fixed amount of data. Storing three elements inside an array always means allocating an array for eight elements. Because its minimum array size, you have to use space to store the key, and so on. Generally, if you have a fixed structure, it's much much more efficient to store it inside an object. Using a class that has declared properties. So this makes a very large difference in some cases, especially if your array only has like two or three elements, you can save a lot of memory with it.

Derick Rethans 12:12

Have you done any benchmarks to see how much memory you'd actually save some likes some some particular scripts that you've parsed with how to tokenizer doesn't matter and how you proposing to do it?

Nikita 12:22

Yeah, I have here in the RFC, some numbers for some particular script that goes down from 14 megabytes to eight megabytes. So that's nearly half the memory usage. Well, actually, maybe I should first actually say what the RFC proposes. The RFCe proposes to instead return objects, an array of objects. And these objects have four properties. So first is again, the ID of the token, then the textual content, the line number, and also the starting position of the token in the string.

Derick Rethans 12:54

Is this something that the tokenizer extension and tracks for you?

Nikita 12:58

I mean, that's something that can easily do, so we can just as will expose it. And these objects are always used. So we no longer make the distinction between single character tokens and complex tokens. So we always return the uniform array of tokens, of token objects. Despite doing that, removing this optimization for a single character tokens, the end result is still that we use half as much memory, simply because objects are that much more efficient than arrays.

Derick Rethans 13:27

That's a clever trick. I'm sure people like that, that using less memory, at least I know I would. Is it also faster or doesn't particularly matter much?

Nikita 13:35

It's also faster, like maybe 30% or something, because memory usage and performance tend to be pretty heavily correlated. So if you use less memory, you also are faster.

Derick Rethans 13:46

That makes sense. Are you thinking of other things that you can add to the tokenizer extension to make working with them even easier?

Nikita 13:52

The way this new functionality is implemented is, we have a PHP token class and on it we have a static method getAll. So instead of calling the token_get_all function, you call PHPToken::getAll(). And one nice thing this allows you to do is to extend this token class. So you can say, MyPHPToken extends PHPToken, and then you call MyPHPToken::getAll() and then we will actually construct your extension class. That means that you can add whatever methods you like, in addition to what we provide by default.

Derick Rethans 14:29

Is that a pattern that we have in other places in PHP as well? Because I don't usually think that even if you'd call an inherited static method, why wouldn't suddenly return the inherited classes object? wDo we did it in other places?

Nikita 14:42

So this is somewhat uncommon in PHP internals. I think it's a pretty common pattern for userland where generally if you return new objects from static methods, you always use new static, not new self. This is essentially late static binding, which we did discuss quite recently. So, there is one limitation here namely that the constructor of the PHPToken class is final. So, you can extend the class and you can add extra methods, but you cannot modify the construction behaviour, because we would like to internally construct these tokens very efficiently by more or less directly writing the values into the right slot in memory and not doing slow constructor calls, becouse this functionality tends to be very performance sensitive. And the same trick where you can extend the class but not change the constructor is also used by the SimpleXML extension. Does exist but not very common in, generally where internal code is concerned, we usually do not really plan for extension. I think nowadays we mark nearly all internal all new internal classes as final simply because extension is such a pain to deal with. And for old classes who usually wish that we had marked them as final. I mean, this is also a general recommendation for userland that, like you should mark things final as much as you can get away with it. But it's much bigger concern for internals because dealing with userland extensions that do unexpected things is much harder for us.

Derick Rethans 16:23

You even need to make sure that your internal structures are properly constructed by the parent's constructor being called from inherited classes but in PHP, there's no such requirement that you do. Pretty sure I've had problems with that for the Date extension a long, long time ago, where people would extend from it, not call the constructor. And then because he didn't think of it, nothing is defined and everything just falls down.

Nikita 16:44

Yeah, so this is one of the common problems. And the other one is that internal classes often define custom object handlers. So that's something only internal classes can do. Just to give one example, they can define debug info handler that modifies the output of var_dump, but nowadays we also have the user land magic methods on get you back into and I think pretty much all internal classes are just going to ignore that, and always return their own internal debug information even if this method has been overwritten, simply because no internal class actually checks for that. And this kind of problem also exists for a lot of other magic, and generally no one takes it into account, and things are just more or less softly broken.

Derick Rethans 17:31

Very recently there was a pull request for Xdebug to change that as well because in Xdebug's debugging output get sent to IDEs. For internal classes always uses internal get debug handler, and for userland classes it uses whatever is userland defined; I mean if there's a magic method we'll use that. The pull request wanted to change Xdebug in such a way that it would also use the get debug info magic method for internal classes, whenever overridden. After some discussion about this, we figured out, this is probably a bad idea to do, and hence, we haven't merged that. Although we end up fixing some other things that the developer also found. That's a curious situation to be in. We would like us to be sort of work the same. But at the same time, you sometimes really want to see the internal information from the classes without developers having hidden the information behind it, right.

Nikita 18:20

Yeah, that's true.

Derick Rethans 18:21

And that is just from a from a debug perspective. And even from, let's make sure things don't crash perspective. I see that the RFC also rejected a few features that aren't part of the current iteration yet or might make sense to add it later. And one of them is about a lazy token stream. What would that be and what sort of different interface would it have?

Nikita 18:43

The lazy token stream basically just means that instead of returning an array of tokens, we return an iterator of tokens, which means that we do not have to store the full array in memory, which, like for the example, I used. The memory usage for the whole token array was eight megabytes, even after these memory usage improvements, which wasn't a fairly large file, but definitely not the largest file. You can encounter especially when it comes to generated files. So there is an advantage of processing tokens one by one as a stream, because then your memory usage is going to be basically O(1), not O(n). The problem is, I mean, the PHP lexer does indeed work one token at a time, so it can support it. The problem is that it has a lot of internal state. And in order to implement this iterator, we would have to backup and restore the state on each produced token to make sure that it's still possible to for example, include and compile other files in the meantime. So this is something that can be improved; we can make that cheaper, but that would be a larger effort. And I'm not really sure it's worthwhile because, while you can process one token at a time. And this is, for example, what the PHP parser does internally. Many practical applications in userland will generally want to have all tokens as an array. Because it makes it simply, makes things easier if you can always look ahead and look back. And I think it would be fairly hard to rewrite the existing libraries in terms of the latest tree. It may be a nice to have, but I'm not the most useful thing for it now.

Derick Rethans 20:32

What has been the feedback for this RFC?

Nikita 20:35

I think pretty good. This is something that we've already discussed years ago. Last time the discussion kind of got a bit got a bit sidetracked. Yeah, one of the dangerous when you start introducing object oriented interfaces. Well, actually, I just call this RFC object-based intentionally, because when you do object oriented then people would like to have their tokens, and their token streams, and their token stream factories, and the token stream managers. And this is basically held this the whole time. But generally everyone who is working on tokens, which is not a lot of people, but those who are working with them know that memory usage is a problem. And the current, current inconsistent structure is a problem, which is why most of them implement their own token objects, and basically do the same thing we propose here just themselves.

Derick Rethans 21:30

When it's this one going up for a vote at the same time?

Nikita 21:32


Derick Rethans 21:33

Both of these RFCs that we spoken about today are both targeted to a PHP eight, I suppose?

Nikita 21:37

Yeah. So right now, I think all RFCs are targeted at PHP 8.

Derick Rethans 21:42

Thank you for taking the time with me today, Nikita to talk about a bunch of little RFCs that you've written. Perhaps by the time this podcast comes out, we've started voting on them and see what happens to them.

Nikita 21:52

Thanks for having me once again.

Derick Rethans 21:56

Thanks for listening to this instalment of PHP internals news. The weekly podcast dedicated to demystifying the development of the PHP language. I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening, and I'll see you next week.

180: Hardware BUGS 🐛

This week Eric, John, and Thomas talk Laracon Online, BUGS, Pis and more Laravel.

179: Just Don’t Close The Browser

PHP Internals News: Episode 42: Userspace Operator Overloading

PHP Internals News: Episode 42: Userspace Operator Overloading

In this episode of "PHP Internals News" I chat with Jan Böhmer (GitHub, LinkedIn) about the Userspace Operator Overloading RFC.

The RSS feed for this podcast is https://derickrethans.nl/feed-phpinternalsnews.xml, you can download this episode's MP3 file, and it's available on Spotify and iTunes. There is a dedicated website: https://phpinternals.news


Derick Rethans 0:16

Hi, I'm Derick. And this is PHP internals news, a weekly podcast dedicated to demystifying the development of the PHP language. This is Episode 42. Today I'm talking with Jan Böhmer about Userspace Operator Overloading. Jan, would you please introduce yourself?

Jan Böhmer 0:33

Hi, my name is Jan Böhmer. I'm a physics student from Germany. And I'm the author of the Operator Overloading RFC.

Derick Rethans 0:40

What brought you to writing this RFC?

Jan Böhmer 0:42

Mostly because I have worked with monetary objects in the past. And it was a bit tedious to work when it comes to calculating. And whenever you have to want to calculate something, you have to call functions on objects. This is not possible to call, just use operators like with normal values like floats or integers.

Derick Rethans 1:06

Because the monetary objects themselves had multiple things embedded in there or something like that?

Jan Böhmer 1:11

Yes, they describe mostly a value and a currency. And together they are saved in an object.

Derick Rethans 1:18

Okay that that seems like a reasonable thing to do, right? I mean, other times people say the same thing about doing complex numbers or something like vectors. The RFC is called Userspace Operator Overloading. What is operator overloading?

Jan Böhmer 1:31

Yeah. Basically, is the idea that you can define operators, like addition or subtraction, or the string concatenation for objects

Derick Rethans 1:43

Does PHP already have something like this?

Jan Böhmer 1:45

Actually, yes. Objects can have something that calls do operation handler. This is called whenever PHP encounters an object, but if used with an operator. The problem is that this handler is only available for PHP internally. So if you want to use it, you have to write an extension.

Derick Rethans 2:06

So it will be possible to have in an extension a Monetary class with its own operators already defined on it.

Jan Böhmer 2:14

Exactly PHP extension GMP uses this as already. The problem is that it's not very flexible, you already have to know, be familiar with C, you have to be able to compile that. You have to contribute it to whatever system you want to use it. Since we have the foreign function interface since PHP 7.4 we can implement many things without to actually have an extension. But this operator overloading is something that's not possible yet inside from PHP.

Derick Rethans 2:47

So it wouldn't have been possible to write the GMP extension which is of course a wrap around libgmp with FFI, because there's no operator overloading available in PHP.

Jan Böhmer 2:59

Not in that comfortable way. You could use this way with functions but it would a bit more tedious then using just operators.

Derick Rethans 3:09

You've mentioned the Monetary object as a good use case. What other use cases can you think of?

Jan Böhmer 3:15

Higher mathematical objects like complex numbers, vectors, or something like tensors, maybe something like the string component of Symfony. That's you can simply concat this string objects with a normal string using the concat operator and doesn't have to use the function to call that, because basically, this should behave similar to a basic string variable, and not, like something completely different.

Derick Rethans 3:45

What is the syntax you're proposing for implementing this?

Jan Böhmer 3:49

My idea was similar to Python to use special metric function, the methods for every operator you can overload. So if you want to overload the addition operator, you would implement function called, a static function called __add, for example. This offer this function takes both operands, the left hand operand and the right hand operand. So you can decide if your current, this object is on the right or the left hand. That is important to determine something like one divided for zero, or one divided through two, or two divided through zero. There are two complete different cases and you have to be able to differentiate between the two cases.

Derick Rethans 4:39

And wouldn't that not be possible to do in non static functions?

Jan Böhmer 4:43

Another problem with non static functions such as possible access to this variable. If you modify an object from inside an operator handler, this can lead to very, very strange behaviour. Because normally operations doesn't change the object itself, but rather you should return a new value. The problems such as asked us to this, it is very easy to accidentally change the this object. If you only pass both objects like via a static methods, it is a bit more clear that you have to create an all new object

Derick Rethans 5:24

Would a type hint enforce that you return a object to the same class?

Jan Böhmer 5:29

Not an all case you want to return an object of the same class. For example, takes a dots product of vectors. So you take two vectors, multiply it in some way as you return to normal float value.

Derick Rethans 5:43

Of course, yes.

Jan Böhmer 5:44

If you were to enforce that, but would always to be the same types as those limits the use cases, in my opinion too much.

Derick Rethans 5:52

But you could of course type hint the __add operator yourself?

Jan Böhmer 5:57

It's always typehints in arguments, in my observation are used as a hint which type are supported for the operand handler. If you for example, vector plus an integer, and your operator handler only declares vector vector as a parameter types, then this operator will not be called, and it will tried to be called on the second object.

Derick Rethans 6:24

So it won't the called and instead it falls back to the second object to be called on.

Jan Böhmer 6:29

Yes, the idea behind it, is that only one of the objects have to know about both classes. So if you want to combine, for example, two objects from different libraries, and library A doesn't know about library B then only objects of the second library have to know about object A. In C++ you can define supported type outside of classes. So you can define combinations between arbitrary objects. The problem is in PHP this was a bit complicated. And the best way to implement this handler in types or classes. So the class has to know about each other objects, it could be interact with possibly.

Derick Rethans 7:14

That makes sense to me. What happens if neither of the classes, or if one of them is a class, and the other one is just a scalar type, if none of the add methods fit, what would happen then?

Jan Böhmer 7:24

The operators implement an handler, then those doesn't support them, then an error would be thrown.

Derick Rethans 7:32

And that is a type error like you'd normally get?

Jan Böhmer 7:34

If the object doesn't implement operator at all, then a notice would be triggered. The idea's that in the moment, it is possible to write something like object plus one, this would be a fine expression in PHP, in the current PHP versions, the object could be interpreted as a one and just a notice would be thrown. For compatibility reasons, my RFC does the same behaviour if no operators are overloaded on objects.

Derick Rethans 8:05

That seems like a reasonable compromise there. I remember from in the past, I think it was Sara Golemon that wrote an extension for using operator overloading. And I remember from the time that there is a problem with using the lesser than or greater than operators, because I think one of them gets flipped around automatically in the engine is being changed in PHP already, or are you running into the same problem?

Jan Böhmer 8:28

I'm not sure about this. My RFC doesn't mention comparison operators like greater or less at all. Because comparison, handled differently internally of PHP. This doesn't work about this. This is mentioned do operator handler. It would be a bit other implementation to do this. Also, the comparison is a bit complicated on its own terms. Maybe it's more useful to use interfaces for, to implements this overloading, or to use. Also, there are some problems. Maybe we should only allow something like an compare operator that's resolved either, minus one, one, or zero. If object's lesser or equal, so that everything is defined together at once. So it's not possible to define an object that has maybe, for example, the lesser, but not the greater operator.

Derick Rethans 9:32

But this sounds like that's for a different RFC.

Jan Böhmer 9:35

Exactly. That's a bit complicated. If the current operator overloading RFC gets passed, then maybe a comparison operator overloading RFC would make sense.

Derick Rethans 9:46

From reading the RFC, I've noticed that you also won't be able to use a shorthand assignment operator. So for example, plus equals. What is the reason for that?

Jan Böhmer 9:56

So every shorthand operator becomes currently an assignment of A plus B. The do operation handler cannot decide if an shorthand operator or normal operator was called. Allowing to overloads the shorthand operators, would maybe allow some benefits for objects terms of memory optimization. If you call a short hand operator you can mutate the object itself doesn't have to create a new object which takes more memory, but I think with the garbage collector of PHP that is not such a big problem. And if that is really needed feature in the future, this could be edited in other, later version of PHP.

Derick Rethans 10:41


Jan Böhmer 10:42

Many other languages doesn't allow to otherwise shorthand operators so I don't think that as too much need for.

Derick Rethans 10:49

Operator overloading sometime has criticisms directed at it. What are some of the criticisms you've heard about it?

Jan Böhmer 10:56

First of all, there are some criticisms about the operator overloading idea in general. So there's also some criticism could be abused for doing some very weird things with operator overloading. So as mentioned C++ there is a shift, left shift operator, is used for output in a stream to the console. Or you could do whatever you want inside this handler, so if somebody would want to save files or modify the file in inside operator overloaded handler, it would be possible, and it's in the most cases function would be more clear what it does.

Derick Rethans 11:35

Of course, in a function add(), if you implemented yourself, nothing stops you of course on writing to a file either.

Jan Böhmer 11:41

Operator overload issues, in my opinion only be used for things that's related to maths or creating custom types that behave similar to the built-in types.

Derick Rethans 11:52

Like complex numbers, or vectors, or monetary numbers. So far, we have been discussing this RFC for a few weeks now. What do you think the chances are of it being passing?

Jan Böhmer 12:05

I'm not sure. I think the idea of operator overloading in general is accepted in the community, but doesn't hear so much backlash. There was some time discussion about how to do it. Some people think it's maybe better if you would implement operator overloading with interfaces, like with ArrayAccess, or to introduce some completely new keywords, like in other languages. In C++, or C#, there are a special keyword operator, that's marks an operator overloading function. So it is clear that is not a real function but special handled way.

Derick Rethans 12:49

Instead of using the underscore underscore in front of method names. When do you think you'll be ready to put this up or vote?

Jan Böhmer 12:56

Wasn't it busy last days, I will do some revises to my RFC, and polish my implementation.

Derick Rethans 13:06

Okay, thank you very much this morning for taking the time to talk to me Jan.

Jan Böhmer 13:10

Thank you very much for inviting me.

Derick Rethans 13:13

Thanks for listening to this instalment of PHP internals news, the weekly podcast dedicated to demystifying the development of the PHP language, I maintain a Patreon account for supporters of this podcast, as well as the Xdebug debugging tool. You can sign up for Patreon at https://drck.me/patreon. If you have comments or suggestions, feel free to email them to derick@phpinternals.news. Thank you for listening and I'll see you next week.