CWYAlpha

Just another WordPress.com site

Archive for 九月 2012

Thought this was cool: Learnable Programming

leave a comment »

Comments: “Learnable Programming”

URL: http://worrydream.com/LearnableProgramming/

Here’s a trick question: How do we get people to understand programming?

Khan Academy recently launched an online environment for learning to program. It offers a set of tutorials based on the JavaScript and Processing languages, and features a “live coding” environment, where the program’s output updates as the programmer types.

Because my work was cited as an inspiration for the Khan system, I felt I should respond with two thoughts about learning:

  • Programming is a way of thinking, not a rote skill. Learning about “for” loops is not learning to program, any more than learning about pencils is learning to draw.
  • People understand what they can see. If a programmer cannot see what a program is doing, she can’t understand it.

Thus, the goals of a programming system should be:

  • to support and encourage powerful ways of thinking
  • to enable programmers to see and understand the execution of their programs

A live-coding Processing environment addresses neither of these goals. JavaScript and Processing are poorly-designed languages that support weak ways of thinking, and ignore decades of learning about learning. And live coding, as a standalone feature, is worthless.

Alan Perlis wrote, “To understand a program, you must become both the machine and the program.” This view is a mistake, and it is this widespread and virulent mistake that keeps programming a difficult and obscure art. A person is not a machine, and should not be forced to think like one.

How do we get people to understand programming?

We change programming. We turn it into something that’s understandable by people.

Contents

The concept of a system split between the computer and the head was derived from Will Wright’s thoughts on games.

A programming system has two parts. The programming “environment” is the part that’s installed on the computer. The programming “language” is the part that’s installed in the programmer’s head.

This essay presents a set of design principles for an environment and language suitable for learning.

The environment should allow the learner to:

  • read the vocabulary — what do these words mean?
  • follow the flow — what happens when?
  • see the state — what is the computer thinking?
  • create by reacting — start somewhere, then sculpt
  • create by abstracting — start concrete, then generalize

The language should provide:

  • identity and metaphor — how can I relate the computer’s world to my own?
  • decomposition — how do I break down my thoughts into mind-sized pieces?
  • recomposition — how do I glue pieces together?
  • readability — what do these words mean?

The features are not the point

We often think of a programming environment or language in terms of its features — this one “has code folding”, that one “has type inference”. This is like thinking about a book in terms of its words — this book has a “fortuitous”, that one has a “munificent”. What matters is not individual words, but how the words together convey a message.

Likewise, a well-designed system is not simply a bag of features. A good system is designed to encourage particular ways of thinking, with all features carefully and cohesively designed around that purpose.

This essay will present many features! The trick is to see through them — to see the underlying design principles that they represent, and understand how these principles enable the programmer to think.

Here is a simple tutorial program that a learner might face:

For the sake of comparison, the examples here will use the same languages as the Khan Academy system, JavaScript and Processing.

Before a reader can make any sense of this code, before she can even begin to understand how it works, here are some questions she will have:

  • What does “fill” mean?
  • What do those numbers after “fill” mean?
  • What do those numbers after “ellipse” mean?
  • What units are these numbers in? What ranges are they in?
  • Why are there so many numbers?

Khan Academy’s tutorials encourage the learner to address these questions by randomly adjusting numbers and trying to figure out what they do.

Thought experiment. Imagine if you bought a new microwave, took it out of the box, and found a panel of unlabeled buttons.

Imagine if the microwave encouraged you to randomly hit buttons until you figured out what they did.

Now, imagine if your cookbook advised you that randomly hitting unlabeled buttons was how you learn cooking.

Make meaning transparent

Learning cooking is not about guessing the functionality of your kitchen appliances. It’s about understanding how ingredients can be combined.

Likewise, guessing the third argument of the “ellipse” function isn’t “learning programming”. It’s simply a barrier to learning. In a modern environment, memorizing the minutia of an API should be as relevant as memorizing times tables.

The environment is responsible for making meaning transparent. The environment must enable the reader to effortlessly read the program, to decode the code, so she can concentrate on genuine programming concepts — how the algorithmic “ingredients” combine.

Here is one example of how a programming environment can make meaning transparent, by providing labels on mouse-over:

Control structures can be labeled as well.

It’s tempting to think of this as “inline help”, but it’s not help — it’s simply labeling. The problem with the following UI isn’t that it lacks a “help feature”. The problem is that nothing is labeled.

That UI is exactly as informative as this line of code:

Why do we consider the code acceptable and the UI not? Why do we expect programmers to “look up” functions in “documentation”, while modern user interfaces are designed so that documentation is typically unnecessary?

Explain in context

A programming environment is a user interface for understanding a program. Especially in an environment for learning, the environment must be designed to explain.

One attribute of great explanations is that they are often embedded in the context of what they are explaining. That is, they show as well as tell.

Instead of just describing what vocabulary means, we can often show it in the context of the data. In the following example, the labels connect the code and its output:

Such a connection can be especially powerful when a line of code does multiple things:

Summary — Read the vocabulary

The particular solutions shown here are merely examples. What matters is the underlying purpose: enabling the learner to read the program.

  • The environment should make meaning transparent, so the learner can concentrate on high-level concepts, not vocabulary.
  • The environment should explain in context. Show and tell. Annotate the data, not just the code.

The examples above are just one of many ways of achieving these goals. All that really matters is that somehow the learner’s questions get answered:

An environment which allows learners to get hung up on these questions is an environment which discourages learners from even getting started.

The Khan Academy system presents the learner with code on the left, and the output of the code on the right. When the code is changed, the output updates instantaneously.

Another thought experiment. Imagine a cooking show, ruthlessly abbreviated. First, you’re shown a counter full of ingredients. Then, you see a delicious soufflé. Then, the show’s over.

Would you understand how that soufflé was made? Would you feel prepared to create one yourself?

Of course not. You need to see how the ingredients are combined. You need to see the steps.

The programming environment exhibits the same ruthless abbreviation as this hypothetical cooking show. We see code on the left and a result on the right, but it’s the steps in between which matter most. The computer traces a path through the code, looping around loops and calling into functions, updating variables and incrementally building up the output. We see none of this.

People understand things that they can see and touch. In order for a learner to understand what the program is actually doing, the program flow must be made visible and tangible.

Make flow tangible

That example program again:

This is a particularly difficult example for a beginner to follow. The “for” construct, with its three statements on a single line, makes the control flow jump around bizarrely, and is an unnecessarily steep introduction to the concept of looping.
To make the flow more sane for a learner, the loop can be rewritten using “while”:

Now, the control flow must be made tangible. We must put the execution of the program into the programmer’s hand, let her feel that it is a real thing, let her own it.

In the following example, the programmer uses a slider to scrub through the execution:

This control allows the programmer to move around the loop at her own pace, and understand what is happening at each step. She can go backwards and forwards, dwell in difficult areas, and compare what is happening at different times. She can study how the output is built up over time, instead of seeing it magically appear all at once.

Make flow visible

The example above allows the programmer to follow the program’s execution over time. But she’s peeking through a pinhole, only seeing a single point in time at any instant. She has no visual context.

To illustrate what I mean, here are two representations of a trip around my neighborhood, one where the neighborhood itself isn’t visible, and one where it is.

This “overhead view” lets a person understand the trip at a higher level. She can see the shape of the trip. She can see patterns.

In the following example, the program flow is plotted on a timeline. Each line of code that is executed leaves a dot behind. The programmer can take in the entire flow at a glance:

The patterns that emerge are especially helpful in the presence of conditionals and other forms of flow control:

It’s possible that some novices may initially be confused by a timeline, but I’d say that learning to read a timeline is a far more valuable and general skill than learning the details of some graphics library.

This visualization allows the programmer to see the “shape” of an algorithm, and understand it at a higher level. The program flow is no longer “one line after another”, but a pattern of lines over time.

Make time tangible

Line-by-line execution is a very fine-grained view of time. The programmer also thinks about time at other granularities.

For instance, animations and games run at a frame rate, say, sixty frames per second. Every 1/60th of a second, the program prepares the next frame to display on the screen. Other programs are event-driven — they respond to an external event, such as a button click or network request, by performing some computation, and then they wait for the next event.

These frames or event responses form a natural way of “chunking” time. If the execution of a line of code is like a sentence, then a frame is like a chapter. These chapters can also be made tangible, so the programmer can understand the execution at this granularity as well.

The following example provides a timeline for exploring line-by-line execution, and a slider for exploring frame-by-frame.

This control enables the programmer to go backwards and forwards through time, study interesting frames, and compare the execution across different frames.

Make time visible

In the above example, we are once again peeking through a pinhole, seeing just one frame at a time. In the following example, all frames are lightly overlaid, in order to give context to the active frame. The entire path of the ball can be seen at once.

* One of the deepest and most powerful ideas in mathematics is the relationship between a differential formulation (such as a step-by-step process, like our “draw” function) and its integrated form (such as a function of time, or plot over time). “Flattening time” allows the learner to see the process and its trajectory as two representations of the same thing, and thereby think of them interchangeably.

The output of the program is no longer a series of fleeting moments, but can be seen as a single, solid thing that extends over time. There is great power in this way of thinking.*

Summary — Follow the flow

Again, the particular solutions shown here are merely examples. What matters is the underlying purpose: enabling the learner to follow the program flow, by controlling time and seeing patterns across time. Transforming flow from an invisible, ephemeral notion into a solid thing that can be studied explicitly.

  • The environment can make flow tangible, by enabling the programmer to explore forward and backward at her own pace.
  • The environment can make flow visible, by visualizing the pattern of execution.
  • The environment can represent time at multiple granularities, such as frames or event responses, to enable exploration across these meaningful chunks of execution.

A simple program:

The third line declares a variable named “scaleFactor”, which varies with each iteration of the loop.

Take a moment to look at that line, and think about these questions:

  • What values does scaleFactor take on?  1?  100?  -1?  pi / 2?
  • What is scaleFactor at the beginning of the loop? At the end?
  • How does scaleFactor change over the course of the loop? Linearly up? Linearly down? Does the change get faster or slower?

Wait. Wait a minute. Were you trying to answer those questions by doing arithmetic in your head? The computer somehow drew that picture, so the computer must have calculated all those scaleFactors itself. Are you seriously recalculating them in your head?

* Other than setting a “breakpoint”, which is like monitoring traffic on the freeway by setting up a barricade. Or writing to a “console”, which is like figuring out where your dog goes during the day by following the trail of droppings.

Now imagine if scaleFactor also depended on some other variables, or some other functions, or external input. There would be no way to easily answer those questions.*

Think about this. We expect programmers to write code that manipulates variables, without ever seeing the values of those variables. We expect readers to understand code that manipulates variables, without ever seeing the values of the variables. The entire purpose of code is to manipulate data, and we never see the data. We write with blindfolds, and we read by playing pretend with data-phantoms in our imaginations.

One of the all-time most popular programming models is the spreadsheet. A spreadsheet is the dual of a conventional programming language — a language shows all the code, but hides the data. A spreadsheet shows all the data, but hides the code. Some people believe that spreadsheets are popular because of their two-dimensional grid, but that’s a minor factor. Spreadsheets rule because they show the data.

Information design pioneer Edward Tufte has one primary rule, and this rule should be the principle underlying any environment for creating or understanding.

Show the data.

If you are serious about creating a programming environment for learning, the number one thing you can do — more important than live coding or adjustable constants, more important than narrated lessons or discussion forums, more important than badges or points or ultra-points or anything else — is to show the data.

Show the data

Because the value of a variable varies over time, showing the data is intimately connected with showing time.

The previous section presented a timeline representation that showed the data at each step. In the following example, the programmer mouses over a particular row of the timeline to concentrate on a single line.

In this example, it is easy to answer the first two questions. By skimming over the execution of that line of code, we can see all of the values that scaleFactor takes on, and when.

However, it is still difficult to answer the third question: how does the variable vary? What is the shape of its change? The question is difficult because we are, once again, peeking through a pinhole, only seeing a single point at a time.

Edward Tufte has a second rule. It is not enough to just show the data. We must show comparisons.

Show comparisons

Data needs context. It is rarely enough to see a single data point in isolation. We understand data by comparing it to other data.

The timeline examples so far have used dots to represent executed lines. But instead of dots, we can show data. The following timeline shows each of the scaleFactors:

Almost every line of code here calculates something. The environment should provide the best visualization of whatever that something is. For example, the “rotate” line can show the rotations.

The “fill” line sets a fill color. That color can be shown.

The “triangle” line draws a triangle to the canvas, rotated and colored. The timeline can show a thumbnail of each triangle produced.

Taken together, we have a timeline that depicts not just the flow, but all of the data calculated in that flow. The execution of the program is laid bare for the reader. At a glance, she can see which lines were executed, when they were executed, and what they produced. The flow and the data are both shown in context.

The example above only loops twenty times. Is it possible to understand a loop with, say, thousands of iterations, without drowning in thousands of numbers?

Yes — there is an entire field of study devoted to depicting large amounts of numbers. To visualize this data, we can use all of the standard techniques of data visualization.

In the following example, as the programmer zooms the timeline out, the visualization automatically switches from a table to a plot.

Eliminate hidden state

In order to understand what a line of code does, the learner must see its effect. For example, as the programmer moves over iterations of the “triangle” line, she sees each triangle appear on the canvas:

The “fill” line, on the other hand, sets the fill color for subsequent drawing operations. When the programmer moves over this line, what effect does she see? She sees nothing happen, because the “fill” function modifies hidden state.

The Processing graphics library relies heavily on implicit state, in the form of the “current” fill color, stroke color, transform matrix, and so on. Code that modifies this state produces no visible effect on the canvas. In an interactive environment, this is unacceptable.

There are two design options here. One alternative is to eliminate the state. For example, color could be passed as a parameter to the “triangle” function.

The other alternative is to show the state. In the following example, the current fill and stroke colors are shown above the canvas. Now, when a line of code changes the fill color, the programmer actually sees something change. Making something visible makes it real.

All state must be eliminated or shown. Either can be a reasonable design decision. An environment that does neither — forcing learners to imagine the state and make sense of functions that produce no visible effect — is irresponsible design, and disrespectful to the learner.

Just the concept of a “transform matrix” is undoubtably baffling for many learners, but with a better metaphor, it needn’t be so. Logo, for instance, uses a turtle to reify translation and rotation, and children understand it readily. More on this later.

The current transform matrix is a particularly critical and confusing member of the state. Drawing anything interesting with the Processing graphics library requires matrix transforms, but the current transform is invisible. Functions such as “scale” and “rotate” have no visible effect, and compound transformations (such as translation followed by scale, or should it be the other way around?) often involve groping blindly through trial-and-error.

In the following example, the transform is visualized, and the effect of every function can be seen directly.

Summary — See the state

Code manipulates data. To understand code, a learner must see the data, and see the effect of code on the data.

  • The environment must show the data. If a line of code computes a thing, that thing should be immediately visible.
  • The environment must show comparisons. If a program computes many things, all of those things should be shown in context. This is nothing more than data visualization.
  • The system must have no hidden state. State should either be eliminated, or represented as explicit objects on the screen. Every action must have a visible effect.

I was recently watching an artist friend begin a painting, and I asked him what a particular shape on the canvas was going to be. He said that he wasn’t sure yet; he was just “pushing paint around on the canvas”, reacting to and getting inspired by the forms that emerged. Likewise, most musicians don’t compose entire melodies in their head and then write them down; instead, they noodle around on a instrument for a while, playing with patterns and reacting to what they hear, adjusting and sculpting.

An essential aspect of a painter’s canvas and a musical instrument is the immediacy with which the artist gets something there to react to. A canvas or sketchbook serves as an “external imagination”, where an artist can grow an idea from birth to maturity by continuously reacting to what’s in front of him.

Programmers, by contrast, have traditionally worked in their heads, first imagining the details of a program, then laboriously coding them.

Working in the head doesn’t scale. The head is a hardware platform that hasn’t been updated in millions of years. To enable the programmer to achieve increasingly complex feats of creativity, the environment must get the programmer out of her head, by providing an external imagination where the programmer can always be reacting to a work-in-progress.

* Recently, some people have mistakenly attributed the “live coding” concept to me, but it’s not a new idea, it’s certainly not “my idea”, and it’s not a particularly interesting idea in itself. Immediate-update is merely a prerequisite for doing anything interesting — it enables other features which require a tight feedback loop. An action game with a low frame rate is a bad game, but simply upping the frame rate doesn’t magically make a game good.

Some programming systems attempt to address this with a so-called “live coding” environment, where the output updates immediately as the code changes. An example of live coding:*

As you can see, live coding, on its own, is almost worthless. The programmer still must type at least a full line of code before seeing any effect. This means that she must already understand what line of code she needs to write. The programmer is still doing the creative work entirely in her head — imagining the next addition to the program and then translating it into code.

Get something on the screen as soon as possible

Live coding does, however, provide a foundation for other features which can jump-start the create-by-reacting process. In the following example, the environment offers autocomplete with default arguments. After typing just a couple of characters, the programmer immediately sees something on the screen, and can proceed to adjust it.

Autocomplete is a common feature of most programming environments, but there are two critical subtleties here. First, the functions all have default arguments (position, width, height, and so on are already filled in), so each completion is a complete statement that produces a visible effect. Second, a default completion is selected immediately. Here’s what this means for the programmer’s thought process:

In the example above, the programmer wants to draw a roof on the house. She doesn’t need to mentally plan out how to draw the roof beforehand — she doesn’t need to imagine which functions would be appropriate. She just needs the vague notion: “I want to draw something.” She starts typing “draw”, and immediately sees a shape on the screen.

At this point, she can stop imagining and start reacting:

  • “This is the wrong shape. Which shape will work better?” She goes down the list and turns the shape into a triangle.
  • “This is a right triangle. I want a different triangle.” She adjusts the triangle’s points into a more roof-like shape.
  • “The roof isn’t lying on the house.” She adjusts the triangle to lower it onto the house.

* Strangely, I don’t actually know of any APIs that are intentionally designed with autocomplete in mind. I do know many APIs, such as Processing, that are designed for brevity, which is irrelevant in an environment with good autocomplete.

This example assumed a hypothetical graphics library which was designed for autocomplete — all of the drawing functions begin with “draw”, so the completion list would appear as the designer intended.*

A different way to structure the library would be to provide a single “shape” function, which takes the type of shape (triangle, ellipse, etc.) as an argument. For example:

This could help to further encourage the create-by-reacting way of thinking. Because “drawTriangle” and “drawRect” aren’t in the vocabulary, the programmer would never find herself thinking about specific shape functions before something is on the screen. Her starting point is always just “shape”.

The environment is the user interface for working with a program. Consider the second menu that appeared above, with “line”, “triangle”, etc. If an argument can take one of five values, the environment should provide the best interface for selecting among those values. That is, in this situation, the programmer is a user who has to select one of five choices. How would a good UI designer represent those five choices? Perhaps more like this:

Why should we expect anything less from a programming environment?

Dump the parts bucket onto the floor

As a child, you probably had the experience of playing with a construction kit of some kind — Legos, or Erector Sets, or even just blocks. As a first act before starting to build, a child will often spread out all of the parts on the floor.

This provides more than simply quick access. It allows the child to scan the available parts and get new ideas. A child building a Lego car might spot a wide flat piece, and decide to give the car wings.

This is a second form of create-by-reacting. In addition to reacting to the object under construction, the child is also reacting to the parts she has available.

In the following example, the available functions are located adjacent to the coding area, and the programmer can skim over these “parts” and get ideas.

The above example encourages the programmer to explore the available functions. A learner who would never think to try typing the “bezier” function, with its unfamiliar name and eight arguments, can now easily stumble upon it and discover what it’s about.

The example above is one way of representing the “parts bucket” for programmatic drawing. But would an user interface designer consider that to be the best interface for drawing a picture on a computer screen? What about the following?

An objection might arise at this point. With this interface, is this even “programming”?

No, not really. But none of the examples in this section are “programming”. Typing in the code to draw a static shape —

— is not programming! It’s merely a very cumbersome form of illustration. It becomes genuine programming when the code is abstracted — when arguments are variable, when a block of code can do different things at different times. The next section will discuss how create-by-reacting leads into create-by-abstracting.

Summary — Create by reacting

* Thus, we need a powerful way of getting something onscreen, and powerful ways of adjusting it. Adjustable values (such as draggable numbers or color palettes) are a valuable component for the latter purpose, and I am a long-time proponent of adjustable values (e.g., Scrubbing Calculator, Explorable Explanations, Ten Brighter Ideas). However, I have always proposed adjustable numbers in a context where the adjuster already understands the meaning of the number. As mentioned earlier, I am very uncomfortable with the Khan Academy approach of encouraging learners to adjust unlabeled numbers and figure out what they’re for, and I feel that this is a case of a tool being adopted without an understanding of what purpose the tool serves.

The create-by-reacting way of thinking could be stated as: start with something, then adjust until it’s right.*

The programmer must be able to do her thinking out in the environment, not trapped in her head. The environment must serve as an external imagination, where the programmer can be continuously reacting to a work-in-progress.

To be clear, this does not relieve the programmer from thinking! It simply makes those thoughts immediately visible. I am happy to be composing this essay in a text editor, where my words become visible and editable as soon as I think of them, as opposed to working entirely internally like the orators and playwrights of the distant past.

  • The environment must be designed to get something on the screen as soon as possible, so the programmer can start reacting. This requires modeling the programmer’s thought process, and designing a system that can pick up on the earliest possible seed of thought.
  • The environment must dump the parts bucket onto the floor, allowing the programmer to continuously react to her raw material and spark new ideas.

Learning programming is learning abstraction.

A computer program that is just a list of fixed instructions — draw a rectangle here, then a triangle there — is easy enough to write. Easy to follow, easy to understand.

It also makes no sense at all. It would be much easier to simply draw that house by hand. What is the point of learning to “code”, if it’s just a way of getting the computer to do things that are easier to do directly?

Because code can be generalized beyond that specific case. We can change the program so it draws the house anywhere we ask. We can change the program to draw many houses, and change it again so that houses can have different heights. Critically, we can draw all these different houses from a single description.

The description still says “draw a rectangle here, then a triangle there”, but the here and there have been abstracted. Different parameters give us different heres and different theres.

How does a programmer learn to write this abstract code? How does she learn to write a single description that is generalized for many cases?

She doesn’t. The learner should start by writing concrete code, and then gradually change it to introduce abstraction. And the environment must provide the tools to perform this process, in such a way that the learner can understand the program at each stage.

Start constant, then vary

In the create-by-abstracting way of thinking, the programmer starts by creating a specific case, typically involving constants. She then moves to the general case by turning those constants into variables. Here’s an example of how the environment can encourage this way of thinking, starting with the house from earlier.

The programmer wants to move the house to a different location. She can’t move it by adjusting a single number in the code, because there are four different points which all need to change at the same time — the rectangle’s origin, and the triangle’s three points.

Here, the programmer selects one of the numbers, and converts it into a variable.

She then connects the variable to another number, by dragging from one to the other.

There are two additional arguments to “triangle” which need to vary as well. When she connects the variable, whose value is 80, to the constant 100, the environment offers a choice of four possible relationships between the numbers.

The four expressions involve addition, subtraction, multiplication, and division respectively. One of them will typically be either the correct relationship or a good starting point.

Here is the entire process of introducing the variable.

Start with one, then make many

In the example above, the house is now abstracted — the code doesn’t just draw one fixed house, but can draw a house anywhere. This abstracted code can now be used to draw many different houses.

In the following example, the programmer wants to draw a row of houses. She selects the abstracted code, and converts it into a loop.

The variable in the first line of the selection becomes an induction variable, and the programmer can then adjust its bounds.

This is a process of starting with a specific case, and progressively generalizing:

  • First, the programmer creates a house at a fixed location. Here, she has interactive control over each individual shape.
  • She turns the house’s location into a variable. Now, she has interactive control over the variable, which affects many shapes.
  • She introduces a loop to vary that variable. Now, she has interactive control over the bounds of the loop, which affects many houses, which affect many shapes.

At each stage, the programmer has interactive control over the relevant parameters, but the parameters are at successively higher levels of abstraction. That is, the programmer can still create by reacting, but she’s creating and reacting at higher levels.

Instead of drawing an evenly-spaced row of houses, the programmer now wants individual control over each of the houses. Starting from the variable abstraction, she selects the code and converts it into a function.

By duplicating the function call, she obtains several houses which can be controlled individually.

Now, instead of identical houses, she wants to vary the heights of the houses. She introduces another variable, and then converts it into an additional argument to the function.

* In Ladder of Abstraction terminology, the programmer climbed two levels on the ladder of abstraction — she controlled x, then abstracted over x, then controlled y, then abstracted over y.

The process, again, consists of starting concrete, and progressively introducing abstraction:*

  • The programmer creates a drawing of a house.
  • She turns x into a variable, so she can control the house’s position.
  • She turns x into a function argument, so different houses can have different positions.
  • She turns y into a variable, so she can control the house’s height.
  • She turns y into a function argument, so different houses can have different heights.

Summary — Create by abstracting

Start concrete, start grounded. Start with one specific case, entirely understood. Then gradually generalize, level by level, in such a way that the programmer still fully understands the program at each level of abstraction.

Fully concrete code can be micromanaged — the programmer has explicit control over every step of the execution. Abstraction means giving up some of this control, and this can be scary for a learner.

But the learner can work up to it — introduce a variable, interactively control it, connect the variable to another value, interactively control it, turn the variable into a function argument, interactively control it. The learner always gets the experience of interactively controlling the lower-level details, understanding them, developing trust in them, before handing off that control to an abstraction and moving to a higher level of control.

* The code transformations shown in the above examples have a superficial resemblance to “refactoring”, which is supported by some environments. However, the features have different intents. Refactoring is about writing working but poorly-organized code, then cleaning it up. Create-by-abstracting is about writing code for a specific case, then generalizing it.

The environment must support this process. A typical text editor only provides direct support for growing “outward” — adding new lines of code. The environment must also support growing “upward” — abstracting over existing code.*

  • The environment should encourage the learner to start constant, then vary, by providing meaningful ways of gradually and seamlessly transitioning constant expressions into variable expressions.
  • The environment should encourage the learner to start with one, then make many, by providing ways of using those variable expressions at a higher level, such as function application or looping.

A programming system has two parts. The environment is installed in on the computer, and the language is installed in the programmer’s head.

The design of the language is just as critical to the programmer’s way of thinking as the design of the environment. In the best cases, they are co-designed and inseparable. Many recent learning environments use JavaScript or Processing, and for the sake of comparison, the examples in this essay used them as well. But neither is a well-designed language for learning.

Fortunately, there are giant shoulders to stand on here — programming systems that were carefully and beautifully designed around the way people think and learn. This section will briefly offer some design principles that have been distilled from these great systems of the past.

Great works

The canonical work on designing programming systems for learning, and perhaps the greatest book ever written on learning in general, is Seymour Papert’s “Mindstorms”.

Designing a learning system without a solid understanding of the principles in this book is like designing a mechanical system without understanding “the lever”. Or “gravity”. If you are reading this essay (and I’m pretty sure you are!) then you need to read “Mindstorms”.

Seriously. I mean it. If you are going to design anything whatsoever related to learning, then you literally need to read “Mindstorms”.

For fuck’s sake, read “Mindstorms”.

This section will make reference to four seminal programming systems that were designed for learning, and I strongly recommend studying each of them.

To be clear, I’m not advocating using any of these systems, in either their historical or modern incarnations. I’m advocating understanding them, and building on their insights.

Identity and metaphor

In Logo, the programmer draws pictures by directing the “turtle”, an onscreen character which leaves a trail as it moves:

Watch just two minutes of this video — the children, and the beardy guy talking:

That’s Seymour Papert explaining the Logo turtle. The turtle serves a number of brilliant functions, but the most important is that the programmer can identify with it. To figure out how to make the turtle perform an action, the programmer can ask how she would perform that action herself, if she were the turtle.

For example, to figure out how to draw a circle, a learner will walk around in circles for a bit, and quickly derive a “circle procedure” of taking a step forward, turning a bit, taking a step forward, turning a bit. After teaching it to herself, the learner can then teach it to the computer.* * Here, the learner has derived and implemented the differential equation for a circle, without knowing what a differential equation is. To quote Papert, a Logo program is an “intuitive analog of the differential equation”. The turtle is the in-computer embodiment of the programmer herself, a “self”, like the player-character in a video game, and thereby allows the learner to transfer her knowledge of her own body into knowledge of programming.

Every programming language is made of metaphors, but some fit the mind better than others. Standard imperative programming uses the metaphor of “assigning to variables”, shuffling bits between little boxes. Unlike the Logo turtle, this metaphor was not designed to resonate with how people learn and understand; it simply evolved as a thin layer over the metaphors used in the underlying machine architecture, such as “storing to memory”.* * Alan Kay in The Early History of Smalltalk: “Assignment statements — even abstract ones — express very low-level goals… Human programmers aren’t Turing machines — and the less their programming systems require Turing machine techniques, the better.”

Smalltalk, like Logo, also has a strong resonant metaphor, which is the message. All computation in Smalltalk is represented by objects sending and responding to messages from other objects. In order to program the behavior of an object, the programmer casts herself into the role of that object (to the extent of referring to the object as “self”!) and thinks of herself as carrying on a conversation with other objects. This is a powerful metaphor, because role-playing and conversing are powerful innate human facilities. As with Logo, tremendous time and thought went into the crafting and honing of Smalltalk’s metaphors.

In HyperCard, the program is represented as a stack of cards, with the programmer drawing objects onto each card. Unlike a typical programming language, where an “object” is an abstract ethereal entity floating inside the computer, every object in HyperCard has a “physical presence” — it has a location on a particular card, it can be seen, it can be interacted with. Every object in HyperCard is a “real thing”, and this is a powerful metaphor which allows programmers to apply their intuition and understanding of the physical world.

Rocky’s Boots is structured as a video game, with a player-character that can be moved around directly. The player not only can pick up and move objects, but also acts as a power source — a literally powerful metaphor. Everything is visible and tangible — electricity is not some abstract voltage reading, but can be seen directly as orange fire, flowing through wires. This beautiful metaphor makes it trivial to follow the flow and see the state.

In Processing, by contrast, the programmer has no identity within the system. There are no strong metaphors that allow the programmer to translate her experiences as a person into programming knowledge. The programmer cannot solve a programming problem by performing it in the real world.

Processing’s core metaphor is the “painter’s algorithm” — the computer places a series of shapes on the screen, like drawing on paper. Because this metaphor carries no computational power (you cannot compute by filling in pixels), all computation occurs outside the bounds of the metaphor. In this example of a bouncing-ball animation —

— the simulated properties of the ball (position, velocity) are not associated with the picture of the ball onscreen. They are computed and stored abstractly as “numbers” in “variables”, and the ball is merely a shadow that is cast off by this ethereal internal representation. The ball cannot be picked up and moved; it cannot be told how to interact with other objects. It is not a “living thing”, and the simulation cannot be understood or thought about in any way other than “numbers in variables”. This is a very weak way of thinking.* * For examples of systems where every onscreen object is a living tangible thing, see Etoys or Morphic.

Decomposition

Modularity is the human mind’s lever against complexity. Breaking down a complex thing into understandable chunks is essential for understanding, perhaps the essence of understanding.

A programming language must encourage the programmer to decompose — to approach a complex problem by breaking it into simpler problems. Papert refers to this as breaking down a program into “mind-size bites”.

Logo uses the metaphor of “teaching the turtle a new word”. To draw a face consisting of four circles, we can teach the turtle a subprocedure for drawing a circle, and then apply that subprocedure four times. Long and careful thought was given to the process by which a learner discovers the need for subprocedures, and then factors a large procedure into subprocedures.

Smalltalk is, in essence, a philosophy of decomposition in the form of a programming language. This is Alan Kay inventing objects:

Bob Barton [said] “The basic principle of recursive design is to make the parts have the same power as the whole.” For the first time I thought of the whole as the entire computer, and wondered why anyone would want to divide it up into weaker things called data structures and procedures. Why not divide it up into little computers… Why not thousands of them, each simulating a useful structure?

Smalltalk’s key insight was that a complex computer program could be decomposed into smaller computers, called “objects”. Programming in Smalltalk is almost entirely an exercise in decomposition — breaking down thoughts into classes and messages.

Forth is another language whose extraordinary decomposability is valuable to study and understand.

Almost every computer language provides some facility for decomposition, but some are better than others. In his wonderful essay Why Functional Programming Matters, John Hughes argues that decomposition lies at the heart of the power of languages like Haskell:

When writing a modular program to
solve a problem, one first divides the problem into subproblems, then solves the
subproblems, and finally combines the solutions. The ways in which one can
divide up the original problem depend directly on the ways in which one can glue
solutions together. Therefore, to increase one’s ability to modularize a problem
conceptually, one must provide new kinds of glue in the programming language.

Functional languages provide two new, very important kinds of glue… This is the key to functional programming’s power — it allows improved modularization.

Khan Academy’s tutorials do not mention decomposition or functions at all, and many example programs are written as one long list of instructions.

Processing allows for Logo-style decomposition into subprocedures, in the form of function definitions. The more powerful Smalltalk-style decomposition, where submodules can be thought about independently, is not supported. In Processing, drawing and input events are tied to single entry points — top-level functions such as “draw” and “mouseDown”. The behavior of submodules must be tangled across these global functions. Clean decomposition is not possible.

Consider a programmer who has made a bouncing ball animation. How does she go from one ball to two, to a hundred? How does she make the balls bounce off one another? How does she make balls draggable with the mouse? In a genuine learning environment such as Etoys, this progression is natural and encouraged. In Processing, each of these steps is a nightmare of needless complexity.

A language that discourages decomposition is a language that cripples a programmer’s most valuable way of thinking.

Recomposition

Creating is remixing. To a large extent, new ideas are old ideas in new combinations.

A programming language must encourage recomposition — grabbing parts of other programs, assembling them together, modifying them, building on top of them. This gives creators the initial material they need to create by reacting, instead of facing every new idea with a blank page. It also allows creators to learn from each other, instead of deriving techniques and style in a vacuum.

HyperCard was designed for recomposition, and is perhaps still unsurpassed in that respect. Bill Atkinson fully intended for creators to assemble a program by copying and pasting objects from other programs, and then gradually tweaking and customizing them. Every program thus serves as a parts kit for creating new programs. Because all source code, if any, is embedded in individual objects in the form of scripts, and because scripts use loose, relative references to other objects, groups of related objects can be transplanted much more easily and successfully than in other systems.* * HyperCard is seen by some as “what the web should have been”. It’s lamentable that a creator cannot, and probably can never, create a website by copying and pasting graphical objects from other websites. This is not due to “technological limitations” — it’s a consequence of thoughtless system design.

Many people revere HyperCard for initiating them into programming. Any user can remix their software with copy and paste, thereby subtly transitioning from user to creator, and often eventually from creator to programmer.

Processing‘s lack of modularity is a major barrier to recomposition. The programmer cannot simply grab a friend’s bouncing ball and place it alongside her own bouncing ball — variables must be renamed or manually encapsulated; the “draw” and mouse functions must be woven together, and so on. One can easily start from an existing Processing program and modify it, but the language does not encourage combining two programs.

Worse, Processing’s dependence on global state hinders even the simplest forms of recomposition. As an analogy, imagine you’re writing an email. You copy some red text from a website, paste it into your email, and everything else in your email turns red:

This is exactly what can happen when copying and pasting lines of Processing code, because Processing’s way of handling color is inherently leaky:

Experienced programmers might look at this example and consider this a programmer’s error, because this is “just how code works.” But this error is not intrinsic to programming; it’s a consequence of specific design decisions — mutable state, global variables, no encapsulation.

Worse yet, Processing has global modes which alter the meaning of function arguments. The following line of code sets a fill color. Do you know what color it is?

Trick question — it’s impossible to know what color it is, because the meaning of “255” depends on the global “color mode”. It could be any of these colors:

If two Processing programs specify their colors in different color modes, then combining the two programs is almost hopeless.

Designing a system that supports recomposition demands long and careful thought, and design decisions that make programming more convenient for individuals may be detrimental to social creation.

Readability

A learner must be able to look at a line of code and know what it means.

Syntax matters. Here are two statements in HyperCard’s scripting language, and their equivalents in a more conventional syntax:

Write “hello” to file “greetings”.
Drag from “0,0” to “100,100” with optionKey.

writeFile(“hello”, “greetings”);
dragMouse(0, 0, 100, 100, OPTION_KEY);

See also Inform 7. English-like languages like these are sometimes accused of being difficult to write (since the syntax is more restrictive than actual English), but that’s a fault of the environment. Programmers shouldn’t be typing this stuff.

HyperTalk happens to use an English-like syntax, but that’s not the point here. What matters is that every argument can be understood in context. It’s clear that “hello” is a string and “greetings” is a filename, and that “0,0” and “100,100” are start and end points. In the conventional syntax, both are ambiguous.

As another example, here’s how a programmer might draw an ellipse in three languages:

canvas drawEllipseCenteredAtX:50 y:50 width:100 height:100.

ellipse(50,50,100,100);

push 100; push 100; push 50; push 50; call _ellipse

In Smalltalk, arguments have context. Processing’s “ellipse” is exactly as cryptic as assembly language. The reader must look up or memorize every argument, a significant barrier to reading.

Names matter. Below are four array methods from Apple’s Cocoa framework, and the equivalent JavaScript methods:


mutate array and return nothing
mutate nothing and return new array

addObject
addObjectsFromArray
arrayByAddingObject
arrayByAddingObjectsFromArray

push
splice
concat
concat

To see the amount of thought that is put into Cocoa naming, I highly recommend the Cocoa coding guidelines.

Cocoa follows strong grammatical conventions which immediately convey the meanings of methods. Verb phrases (“addObject”) perform an action and return nothing. Noun phrases (“arrayByAddingObject”) return the noun so named, and generally do not have stateful effects unless the name suggests so. Expected arguments are clearly indicated by the name, in Smalltalk style. (“addObject” takes an object; “addObjectsFromArray” takes an array.) Most Cocoa code can thus be read and at least vaguely understood without documentation.

* Processing’s “fill” and “stroke” read as verbs, although it’s possible they were intended as shorthands to nouns (“fill color” and “stroke color”). In most other libraries (HTML canvas, Quartz, cairo), “fill” and “stroke” are unambiguously verbs, and act accordingly.

By contrast, many of Processing’s function names are grammatically ambiguous or misleading. Many nouns, such as “ellipse” and “triangle”, perform actions. Many verbs, such as “fill” and “stroke”, do not.* The programmer constructs a color using a noun (“color”), and constructs an image using a verb (“createImage”). This sort of linguistic sloppiness is inappropriate, especially in a system for learning. A language must be parsed by people, not just compilers.

The design principles presented in this essay can be used as a checklist to evaluate a programming system for learning.

Does the environment allow the learner to…

  • read the vocabulary? — Is meaning transparent? Is meaning explained in context, by showing and telling?
  • follow the flow? — Is time visible and tangible? At all meaningful granularities?
  • see the state? — Does the environment show the data? Show comparisons? Is hidden state eliminated?
  • create by reacting? — Is something on screen as soon as possible? Is the parts bucket on the floor?
  • create by abstracting? — Can the programmer start concrete, then generalize?

Does the language provide…

  • identity and metaphor? — Is the computer’s world connected to the programmer’s world?
  • decomposition? — Can the programmer break down her thoughts into mind-sized pieces?
  • recomposition? — Can the programmer put diverse pieces together?
  • readability? — Is meaning transparent?

This essay suggested some features and references that address these questions, but the questions matter more than my answers.

Some people will defend poorly-designed systems by pointing out all the creativity that they have enabled. For example, if novices are creating lots of programs in the Khan Academy and Processing systems, doesn’t that mean the systems are worthwhile and valuable?

Not necessarily. People are inherently creative, and some will manage to create in even the most hostile of environments. That doesn’t justify bad design. Ian Bogost has a particularly memorable response to that line of thinking.

If you are designing a system and you can’t answer these questions, it’s time to reopen your sketchbook, because your design’s not done yet.

These are not training wheels

These design principles were presented in the context of systems for learning, but they apply universally. An experienced programmer may not need to know what an “if” statement means, but she does need to understand the runtime behavior of her program, and she needs to understand it while she’s programming.

A frequent question about the sort of techniques presented here is, “How does this scale to real-world programming?” This is a reasonable question, but it’s somewhat like asking how the internal combustion engine will benefit horses. The question assumes the wrong kind of change.

Here is a more useful attitude: Programming has to work like this. Programmers must be able to read the vocabulary, follow the flow, and see the state. Programmers have to create by reacting and create by abstracting. Assume that these are requirements. Given these requirements, how do we redesign programming?

Here’s an example. In many styles of programming today, when an application launches, it creates a large set of interconnected stateful objects. To see the effect of a code change, the application must be “relaunched” — that is, its entire world is destroyed, and rebuilt again from scratch. How can we “create by reacting”, continuously changing the code and seeing continuous effects in the flow and data, when there is no continuity between the application’s state before and after the change?

* In the Smalltalk model, for instance, state is persistent and code changes don’t affect data. In the Clojure model, code is “mostly functional”, with a small amount of carefully-managed state. Either model could be a starting point for a system where continuous code changes can be seen as continuous effects. But there is no future in destroy-the-world programming.

We can’t. That’s the wrong question. A better question is: How do we design a new programming model that does allow for continuous change? We already have clear hints.*

Another example. Most programs today manipulate abstract data structures and opaque objects, not pictures. How can we visualize the state of these programs?

Again, wrong question. A better attitude is to assert that we have to be able to understand the state of our programs. We can then ask: How do we design data structures that can be visualized? Can we invent data structures that are intended to be visualized? How do we move towards a culture where only visually-understandable data is considered sound? Where opaque data is regarded in the same way that “goto” is today?** Forward reference: Some work that I’ve done in automatic visualization of ad-hoc data structures will be published later this year, in collaboration with Viewpoints Research.

In his influental essay No Silver Bullet, Fred Brooks makes the case that software is inherently “invisible and unvisualizable”, and points out the universal failure of so-called “visual programming” environments. I don’t fault Fred Brooks for his mistake — visual programming is indeed worthless. But that’s because it visualizes the wrong thing.

Traditional visual environments visualize the code. They visualize static structure. But that’s not what we need to understand. We need to understand what the code is doing.

Visualize data, not code. Dynamic behavior, not static structure.

Maybe we don’t need a silver bullet. We just need to take off our blindfolds to see where we’re firing.

Much thanks to Star Simpson, Dan Amelang, Dave Cerf, Patrick Collison, Christina Cacioppo, and Oliver Steele for their feedback on this essay.

This essay was an immune response, triggered by hearing too many times that Inventing on Principle was “about live coding”, and seeing too many attempts to “teach programming” by adorning a JavaScript editor with badges and mascots.

Please read Mindstorms. Okay?


from Hacker News 200: http://feedproxy.google.com/~r/hacker-news-feed-200/~3/z9czrwuRF_A/

Advertisements

Written by cwyalpha

九月 28, 2012 at 10:08 上午

发表在 Uncategorized

Thought this was cool: 12306订票系统设计关键点

leave a comment »

12306全国火车票网上售票网站的情况大家都见到了,如果让你来设计该订票网站,你会如何设计才能应对如此大规模以及高并发的情况呢?以下是百度前技术总监邵辉给出的设计:

列车在线订票系统的业务逻辑比较简单,不用多说。可能的瓶颈有两个,一个是车次和剩余票量的查询,一个是下单。在设计软件架构之前,需要先研究产品需求、软硬件条件、网络环境以及关联系统的接口,但这些资料无从获得,所以只能做几点分析和假设,做为设计的前提条件。

1.2012年铁路春运是2.35亿人次,去程售票的那几天应该是订票的最高峰点,假设是3天内订出1.2亿张票,那么每天是4000万张。由于还有车站窗口、电话、代售点等渠道,所以每天通过网站售出的票应该小于4000万张,这里假设2000万张是由网站售出的。

2.如果2000万张票是在一天内均匀地订出,那么每秒钟大约是230张。中国排名前100位的网站应该都会超过这个事务量,不会有什么难题。问题是,订票网站是在一个固定时刻(早上6:00)开始放票,考虑一个极端的情况:早已守候在电脑前的2000万用户在准点开始按下按钮下单,并且都在1分钟时间内订到了票,那么系统需要每秒33万的事务处理能力,这至少需要上千台服务器的集群才能按时处理完。(按照网上有关12306建设资金的报道来看,服务器投入肯定远远不到这个数。)实际情况当然不会这么极端,但必须保证整个系统有非常好的横向扩展能力,以便在必要的时候添加设备扩展服务能力。车站窗口、代售点和电话售票之所以不会产生这样的峰值,原因是这些渠道都是有人工受理,效率足够低,低到用户需要排几个小时的队来等候,自然就把峰值给抹平了。

3.前面还不是最大的问题。铁道部应该还有个核心数据库,保存最权威的票务数据,网络订票系统、电话订票系统和代售点必须与这个数据库对接以提交订单记录和获得准确的车票余量信息。至于这个接口有多少条连接,每秒允许多少次事务,那就不得而知了。这里我们假定接口要么足够宽,宽到不会成为瓶颈,要么在事先已有固定数量的车票分配给了网络订票系统,这样网络订票系统就可以根据这个固定数量自主地接受订单,然后在后台慢慢地把订单数据传给核心数据库。否则,就好像8车道的马路一下变成了2车道,无论如何也不可能让用户畅通无阻地订到票。

有了上面的分析和假设,可以考虑以下设计方案:

  • 车次和剩余票量的查询。考虑到车次查询量可能是订单数量的数倍至数十倍,不能让用户提交查询请求时直接去主数据库检索数据,而应该采用前端+缓存+检索+数据库的多层逻辑结构。数据库存放持久化的权威数据并保证数据的一致性;缓存层负责把车次、余量等数据放到内存中以保证最好的查询性能,并有比较好的横向扩展性;检索机负责定时(例如每5秒一次)去数据库检索所有车次信息并主动更新缓存机上的数据;前端负责响应来自用户的web请求。这个架构无法保证用户看到的车票余量是实时准确的(比如有数秒的滞后),但由于用户从看到车票余量到完成订单之间肯定是有时间间隔的,在订票高峰期和票量较少时本来就无法保证“在看到有票的情况下一定能订到票”(技术上能够实现这一点,但代价非常大),所以这个缺陷并不明显,是个很划得来的折中。注意是检索机负责将车票数据抓出来并更新到缓存机上,这是保护数据库并使缓存层能够线性扩展的关键方法。另外查询页面需要采取防频繁刷新的措施,这个在前端机上设置web
    server策略即可。
  • 下单部分由于要更新车票余量,必须保证数据的一致性,扩展性不可能很好,因此是整个系统中最为脆弱的一环。实现的方法分同步处理和异步处理两种。同步处理就是用户选择完车次正式下单订票后,立即锁住车票记录并检查车票真实余量,如果大于1,那么余量减1,解除锁定并回复用户订票成功进入支付流程,否则解除锁定回复订票失败请用户选择其它车次。这是订票系统的标准流程,无论用户量大还是小,处理流程都是一样的。为了支撑春运这种极端情况下的高访问量,需要提高订单处理的并发吞吐量和单个事务的处理速度。提高吞吐量可以将不同车次的车票数据分拆到不同的物理服务器上,提高订单处理速度可以考虑取消关系数据库,将车次数据放到内存中并用原生语言实现订单处理逻辑。有一个比较值得考虑的措施是在用户下单前用图片或者短信的方式要求用户二次验证,这既可以防止刷页面,也可以使峰值变得更平缓。异步处理就是在用户提交订单时并不立即告诉用户订票成功或者失败,只是将订票请求放入队列里排队,订单成功处理后再通知用户。处理优先级上采用时间排序或者抽签都可以,不过抽签适合在非常时期采用,并不适合作为一个标准策略,这多少增加了系统开发的复杂度。采用异步的方式将会在最大程度上避免用户下单高峰造成的冲击,缺点是用户不知道什么时候能有结果,是否应该尝试其它车次,这对用户体验有一定程度的损伤。
  • 硬件架构方面,负载均衡设备是必须采用的,除了扩展负载能力,也需要扛住DoS攻击。服务器用普通PC服务器就可以了。网络架构方面,内网应该设计成无阻塞的,外网引入三大运营商的BGP带宽,不要用静态带宽。

最后说一句,几千万用户同时下订单,即使是三大互联网巨头的系统,也不一定撑得住,12306网站崩溃并不算太丢人,但需要好好考虑架构优化方案,明年春运不能再倒了。

 

全文转载自:http://iecspace.ecjtu.org/posts/12306-website-architecture

 青春就应该这样绽放  游戏测试:三国时期谁是你最好的兄弟!!  你不得不信的星座秘密
from 互联网旁观者: http://blog.sina.com.cn/s/blog_72995dcc01017dt4.html

Written by cwyalpha

九月 28, 2012 at 10:08 上午

发表在 Uncategorized

Thought this was cool: 由12306.cn谈谈网站性能技术

leave a comment »

12306.cn网站挂了,被全国人民骂了。我这两天也在思考这个事,我想以这个事来粗略地和大家讨论一下网站性能的问题。因为仓促,而且完全基于本人有限的经验和了解,所以,如果有什么问题还请大家一起讨论和指正。(这又是一篇长文,只讨论性能问题,不讨论那些UI,用户体验,或是是否把支付和购票下单环节分开的功能性的东西)

业务

任何技术都离不开业务需求,所以,要说明性能问题,首先还是想先说说业务问题。

  • 其一有人可能把这个东西和QQ或是网游相比。但我觉得这两者是不一样的,网游和QQ在线或是登录时访问的更多的是用户自己的数据,而订票系统访问的是中心的票量数据,这是不一样的。不要觉得网游或是QQ能行你就以为这是一样的。网游和QQ
    的后端负载相对于电子商务的系统还是简单。
  • 其二有人说春节期间订火车的这个事好像网站的秒杀活动。的确很相似,但是如果你的思考不在表面的话,你会发现这也有些不一样。火车票这个事,还有很多查询操作,查时间,查座位,查铺位,一个车次不
    行,又查另一个车次,其伴随着大量的查询操作,下单的时候需要对数据库操作。而秒杀,直接杀就好了。另外,关于秒杀,完全可以做成只接受前N个用户的请求(完全不操作后端的任何数据,
    仅仅只是对用户的下单操作log),这种业务,只需要在内存cache中放好可秒杀的数量,还可以把数据分布开来放,100商品,10台服务器一台放10个,无需在当时操作任何数据库。可以订单数够后,停止秒杀,然后批量写数据库。而且秒杀的商品不多。火车票这个不是像秒杀那么简单的,春运时间,几乎所有的票都是热门票,而且几乎是全国人民都来了。(淘宝的双十一也就3百万用户,而火车票瞬时有千万级别甚至是亿级别的)
  • 其三有人拿这个系统和奥运会的票务系统比较。我觉得还是不一样。虽然奥运会的票务系统当年也一上线就废了。但是奥运会用的是抽奖的方式,也就是说不存在先来先得的抢的方式,而且,是事后抽奖,事前只需要收信息,事前不需要保证数据一致性,没有锁,很容易水平扩展。
  • 其四订票系统应该和电子商务的订单系统很相似,都是需要对库存进行:1)占住库存,2)支付(可选),3)扣除库存的操作。这个是需要有一致性的检查的,也就是在并发时需要对数据加锁的。B2C的电商基本上都会把这个事干成异步的,也就是说,你下的订单并不是马上处理的,而是延时处理的,只有成功处理了,系统才会给你一封确认邮件说是订单成功。我相信有很多朋友都收到认单不成功的邮件。这就是说,数据一致性在并发下是一个瓶颈
  • 其五铁路的票务业务很变态,其采用的是突然放票,而有的票又远远不够大家分,所以,大家才会有抢票这种有中国特色的业务的做法。于是当票放出来的时候,就会有几百万人甚至上千万人杀上去,查询,下单。几十分钟内,一个网站能接受几千万的访问量,这个是很恐怖的事情。据说12306的高峰访问是10亿PV,集中在早8点到10点,每秒PV在高峰时上千万。

多说几句:

  • 库存是B2C的恶梦,库存管理相当的复杂。不信,你可以问问所有传统和电务零售业的企业,看看他们管理库存是多么难的一件事。不然,就不会有那么多人在问凡客的库存问题了。(你还可以看看《乔布斯传》,你就知道为什么Tim会接任Apple的CEO了,最主要的原因是他搞定了苹果的库存周期问题)
  • 对于一个网站来说,浏览网页的高负载很容易搞定,查询的负载有一定的难度去处理,不过还是可以通过缓存查询结果来搞定,最难的就是下单的负载。因为要访问库存啊,对于下单,基本上是用异步来搞定的。去年双11节,淘宝的每小时的订单数大约在60万左右,京东一天也才能支持40万(居然比12306还差),亚马逊5年前一小时可支持70万订单量。可见,下订单的操作并没有我们相像的那么性能高。
  • 淘宝要比B2C的网站要简单得多,因为没有仓库,所以,不存在像B2C这样有N个仓库对同一商品库存更新和查询的操作。下单的时候,B2C的
    网站要去找一个仓库,又要离用户近,又要有库存,这需要很多计算。试想,你在北京买了一本书,北京的仓库没货了,就要从周边的仓库调,那就要去看看沈阳或
    是西安的仓库有没有货,如果没有,又得看看江苏的仓库,等等。淘宝的就没有那么多事了,每个商户有自己的库存,库存就是一个数字,并且库存分到商户头上了,反而有利于性能扩展。
  • 数据一致性才是真正的性能瓶颈。有
    人说nginx可以搞定每秒10万的静态请求,我不怀疑。但这只是静态请求,理论值,只要带宽、I/O够强,服务器计算能力够,并支持的并发连接数顶得住10万TCP链接的建立
    的话,那没有问题。但在数据一致性面前,这10万就完完全全成了一个可望不可及的理论值了。

我说那么多,我只是想从业务上告诉大家,我们需要从业务上真正了解春运铁路订票这样业务的变态之处。

前端性能优化技术

要解决性能的问题,有很多种常用的方法,我在下面列举一下,我相信12306这个网站使用下面的这些技术会让其性能有质的飞跃。

一、前端负载均衡

通过DNS的负载均衡器(一般在路由器上根据路由的负载重定向)可以把用户的访问均匀地分散在多个Web服务器上。这样可以减少Web服务器的请求负载。因为http的请求都是短作业,所以,可以通过很简单的负载均衡器来完成这一功能。最好是有CDN网络让用户连接与其最近的服务器(CDN通常伴随着分布式存储)。(关于负载均衡更为详细的说明见“后端的负载均衡”)

二、减少前端链接数

我看了一下12306.cn,打开主页需要建60多个HTTP连接,车票预订页面则有70多个HTTP请求,现在的浏览器都是并发请求的(当然,浏览器的一个页面的并发数是有限的,但是你挡不住用户开多个页面,而且,后端服务器TCP链接在前端断开始,还不会马上释放或重要)。所以,只要有100万个用户,就有可能会有6000万个链接(访问第一次后有了浏览器端的cache,这个数会下来,就算只有20%也是百万级的链接数),太多了。一个登录查询页面就好了。把js打成一个文件,把css也打成一个文件,把图标也打成一个文件,用css分块展示。把链接数减到最低。

三、减少网页大小增加带宽

这个世界不是哪个公司都敢做图片服务的,因为图片太耗带宽了。现在宽带时代很难有人能体会到当拨号时代做个图页都不敢用图片的情形(现在在手机端浏览也是这个情形)。我查看了一下12306首页的需要下载的总文件大小大约在900KB左右,如果你访问过了,浏览器会帮你缓存很多,只需下载10K左右的文件。但是我们可以想像一个极端一点的案例,1百万用户同时访问,且都是第一次访问,每人下载量需要1M,如果需要在120秒内返回,那么就需要,1M
* 1M /120 * 8 =
66Gbps的带宽。很惊人吧。所以,我估计在当天,12306的阻塞基本上应该是网络带宽,所以,你可能看到的是没有响应。后面随着浏览器的缓存帮助12306减少很多带宽占用,于是负载一下就到了后端,后端的数据处理瓶颈一下就出来。于是你会看到很多http
500之类的错误。这说明后端服务器垮了。

四、前端页面静态化

静态化一些不常变的页面和数据,并gzip一下。还有一个变态的方法是把这些静态页面放在/dev/shm下,这个目录就是内存,直接从内存中把文件读出来返回,这样可以减少昂贵的磁盘I/O。使用nginx的sendfile功能可以让这些静态文件直接在内核心态交换,可以极大增加性能。

五、优化查询

很多人查询都是在查一样的,完全可以用反向代理合并这些并发的相同的查询。这样的技术主要用查询结果缓存来实现,第一次查询走数据库获得数据,并把数据放到缓存,后面的查询统统直接访问高速缓存。为每个查询做Hash,使用NoSQL的技术可以完成这个优化。(这个技术也可以用做静态页面)

对于火车票量的查询,个人觉得不要显示数字,就显示一个“有”或“无”就好了,这样可以大大简化系统复杂度,并提升性能。把查询对数据库的负载分出去,从而让数据库可以更好地为下单的人服务。

六、缓存的问题

缓存可以用来缓存动态页面,也可以用来缓存查询的数据。缓存通常有那么几个问题:

1)缓存的更新。也叫缓存和数据库的同步。有这么几种方法,一是缓存time
out,让缓存失效,重查,二是,由后端通知更新,一量后端发生变化,通知前端更新。前者实现起来比较简单,但实时性不高,后者实现起来比较复杂
,但实时性高。

2)缓存的换页。内存可能不够,所以,需要把一些不活跃的数据换出内存,这个和操作系统的内存换页和交换内存很相似。FIFO、LRU、LFU都是比较经典的换页算法。相关内容参看Wikipeida的缓存算法

3)缓存的重建和持久化。缓存在内存,系统总要维护,所以,缓存就会丢失,如果缓存没了,就需要重建,如果数据量很大,缓存重建的过程会很慢,这会影响生产环境,所以,缓存的持久化也是需要考虑的。

诸多强大的NoSQL都很好支持了上述三大缓存的问题。

后端性能优化技术

前面讨论了前端性能的优化技术,于是前端可能就不是瓶颈问题了。那么性能问题就会到后端数据上来了。下面说几个后端常见的性能优化技术。

一、数据冗余

关于数据冗余,也就是说,把我们的数据库的数据冗余处理,也就是减少表连接这样的开销比较大的操作,但这样会牺牲数据的一致性。风险比较大。很多人把NoSQL用做数据,快是快了,因为数据冗余了,但这对数据一致性有大的风险。这需要根据不同的业务进行分析和处理。(注意:用关系型数据库很容易移植到NoSQL上,但是反过来从NoSQL到关系型就难了)

二、数据镜像

几乎所有主流的数据库都支持镜像,也就是replication。数据库的镜像带来的好处就是可以做负载均衡。把一台数据库的负载均分到多台上,同时又保证了数据一致性(Oracle的SCN)。最重要的是,这样还可以有高可用性,一台废了,还有另一台在服务。

数据镜像的数据一致性可能是个复杂的问题,所以我们要在单条数据上进行数据分区,也就是说,把一个畅销商品的库存均分到不同的服务器上,如,一个畅销商品有1万的库存,我们可以设置10台服务器,每台服务器上有1000个库存,这就好像B2C的仓库一样。

三、数据分区

数据镜像不能解决的一个问题就是数据表里的记录太多,导致数据库操作太慢。所以,把数据分区。数据分区有很多种做法,一般来说有下面这几种:

1)把数据把某种逻辑来分类。比如火车票的订票系统可以按各铁路局来分,可按各种车型分,可以按始发站分,可以按目的地分……,反正就是把一张表拆成多张有一样的字段但是不同种类的表,这样,这些表就可以存在不同的机器上以达到分担负载的目的。

2)把数据按字段分,也就是竖着分表。比如把一些不经常改的数据放在一个表里,经常改的数据放在另外多个表里。把一张表变成1对1的关系,这样,你可以减少表的字段个数,同样可以提升一定的性能。另外,字段多会造成一条记录的存储会被放到不同的页表里,这对于读写性能都有问题。但这样一来会有很多复杂的控制。

3)平均分表。因为第一种方法是并不一定平均分均,可能某个种类的数据还是很多。所以,也有采用平均分配的方式,通过主键ID的范围来分表。

4)同一数据分区。这个在上面数据镜像提过。也就是把同一商品的库存值分到不同的服务器上,比如有10000个库存,可以分到10台服务器上,一台上有1000个库存。然后负载均衡。

这三种分区都有好有坏。最常用的还是第一种。数据一旦分区,你就需要有一个或是多个调度来让你的前端程序知道去哪里找数据。把火车票的数据分区,并放在各个省市,会对12306这个系统有非常有意义的质的性能的提高

四、后端系统负载均衡

前面说了数据分区,数据分区可以在一定程度上减轻负载,但是无法减轻热销商品的负载,对于火车票来说,可以认为是大城市的某些主干线上的车票。这就需要使用数据镜像来减轻负载。使用数据镜像,你必然要使用负载均衡,在后端,我们可能很难使用像路由器上的负载均衡器,因为那是均衡流量的,因为流量并不代表服务器的繁忙程度。因此,我们需要一个任务分配系统,其还能监控各个服务器的负载情况。

任务分配服务器有一些难点:

  • 负载情况比较复杂。什么叫忙?是CPU高?还是磁盘I/O高?还是内存使用高?还是并发高?还是内存换页率高?你可能需要全部都要考虑。这些信息要发送给那个任务分配器上,由任务分配器挑选一台负载最轻的服务器来处理。
  • 任务分配服务器上需要对任务队列,不能丢任务啊,所以还需要持久化。并且可以以批量的方式把任务分配给计算服务器。
  • 任务分配服务器死了怎么办?这里需要一些如Live-Standby或是failover等高可用性的技术。我们还需要注意那些持久化了的任务的队列如何转移到别的服务器上的问题。

我看到有很多系统都用静态的方式来分配,有的用hash,有的就简单地轮流分析。这些都不够好,一个是不能完美地负载均衡,另一个静态的方法的致命缺陷是,如果有一台计算服务器死机了,或是我们需要加入新的服务器,对于我们的分配器来说,都需要知道的。另外,还要重算哈希(一致性hash可以部分解决这个问题)。

还有一种方法是使用抢占式的方式进行负载均衡,由下游的计算服务器去任务服务器上拿任务。让这些计算服务器自己决定自己是否要任务。这样的好处是可以简化系统的复杂度,而且还可以任意实时地减少或增加计算服务器。但是唯一不好的就是,如果有一些任务只能在某种服务器上处理,这可能会引入一些复杂度。不过总体来说,这种方法可能是比较好的负载均衡。

五、异步、 throttle 和 批量处理

异步、throttle(节流阀) 和批量处理都需要对并发请求数做队列处理的。

  • 异步在业务上一般来说就是收集请求,然后延时处理。在技术上就是可以把各个处理程序做成并行的,也就可以水平扩展了。但是异步的技术问题大概有这些,a)被调用方的结果返回,会涉及进程线程间通信的问题。b)如果程序需要回滚,回滚会有点复杂。c)异步通常都会伴随多线程多进程,并发的控制也相对麻烦一些。d)很多异步系统都用消息机制,消息的丢失和乱序也会是比较复杂的问题。
  • throttle
    技术其实并不提升性能,这个技术主要是防止系统被超过自己不能处理的流量给搞垮了,这其实是个保护机制。使用throttle技术一般来说是对于一些自己无法控制的系统,比如,和你网站对接的银行系统。
  • 批量处理的技术,是把一堆基本相同的请求批量处理。比如,大家同时购买同一个商品,没有必要你买一个我就写一次数据库,完全可以收集到一定数量的请求,一次操作。这个技术可以用作很多方面。比如节省网络带宽,我们都知道网络上的MTU(最大传输单元),以态网是1500字节,光纤可以达到4000多个字节,如果你的一个网络包没有放满这个MTU,那就是在浪费网络带宽,因为网卡的驱动程序只有一块一块地读效率才会高。因此,网络发包时,我们需要收集到足够多的信息后再做网络I/O,这也是一种批量处理的方式。批量处理的敌人是流量低,所以,批量处理的系统一般都会设置上两个阀值,一个是作业量,另一个是timeout,只要有一个条件满足,就会开始提交处理。

所以,只要是异步,一般都会有throttle机制,一般都会有队列来排队,有队列,就会有持久化,而系统一般都会使用批量的方式来处理

云风同学设计的“排队系统”
就是这个技术。这和电子商务的订单系统很相似,就是说,我的系统收到了你的购票下单请求,但是我还没有真正处理,我的系统会跟据我自己的处理能力来throttle住这些大量的请求,并一点一点地处理。一旦处理完成,我就可以发邮件或短信告诉用户你来可以真正购票了。

在这里,我想通过业务和用户需求方面讨论一下云风同学的这个排队系统,因为其从技术上看似解决了这个问题,但是从业务和用户需求上来说可能还是有一些值得我们去深入思考的地方:

1)队列的DoS攻击。首先,我们思考一下,这个队是个单纯地排队的吗?这样做还不够好,因为这样我们不能杜绝黄牛,而且单纯的ticket_id很容易发生DoS攻击,比如,我发起N个
ticket_id,进入购票流程后,我不买,我就耗你半个小时,很容易我就可以让想买票的人几天都买不到票。有人说,用户应该要用身份证来排队,
这样在购买里就必需要用这个身份证来买,但这也还不能杜绝黄牛排队或是号贩子。因为他们可以注册N个帐号来排队,但就是不买。黄牛这些人这个时候只需要干一个事,把网站搞得正常人不能访问,让用户只能通过他们来买。

2)对列的一致性?对这个队列的操作是不是需要锁?只要有锁,性能一定上不去。试想,100万个人同时要求你来分配位置号,这个队列将会成为性能瓶颈。你一定没有数据库实现得性能好,所以,可能比现在还差。抢数据库和抢队列本质上是一样的

3)队列的等待时间。购票时间半小时够不够?多不多?要是那时用户正好不能上网呢?如果时间短了,用户不够时间操作也会抱怨,如果时间长了,后面在排队的那些人也会抱怨。这个方法可能在实际操作上会有很多问题。另外,半个小时太长了,这完全不现实,我们用15分钟来举例:有1千万用户,每一个时刻只能放进去1万个,这1万个用户需要15分钟完成所有操作,那么,这1千万用户全部处理完,需要1000*15m
= 250小时,10天半,火车早开了。(我并非信口开河,根据铁道部专家的说明:这几天,平均一天下单100万,所以,处理1000万的用户需要十天。这个计算可能有点简单了,我只是想说,在这样低负载的系统下用排队可能都不能解决业务问题

4)队列的分布式。这个排队系统只有一个队列好吗?还不足够好。因为,如果你放进去的可以购票的人如果在买同一个车次的同样的类型的票(比如某动车卧铺),还是等于在抢票,也就是说系统的负载还是会有可能集中到其中某台服务器上。因此,最好的方法是根据用户的需求——提供出发地和目的地,来对用户进行排队。而这样一来,队列也就可以是多个,只要是多个队列,就可以水平扩展了。这样可以解决性能问题,但是没有解决用户长时间排队的问题。

我觉得完全可以向网上购物学习。在排队(下单)的时候,收集好用户的信息和想要买的票,并允许用户设置购票的优先级,比如,A车次卧铺买
不到就买
B车次的卧铺,如果还买不到就买硬座等等,然后用户把所需的钱先充值好,接下来就是系统完全自动地异步处理订单
。成功不成功都发短信或邮件通知用户。这样,系统不仅可以省去那半个小时的用户交互时间,自动化加快处理,还可以合并相同购票请求的人,进行批处理(减少数据库的操作次数)。这种方法最妙的事是可以知道这些排队用户的需求,不但可以优化用户的队列,把用户分布到不同的队列,还可以像亚马逊的心愿单一样,通过一些计算就可以让铁道部做车次统筹安排和调整(最后,排队系统(下单系统)还是要保存在数据库里的或做持久化,不能只放在内存中,不然机器一down,就等着被骂吧)。

小结

写了那么多,我小结一下:

0)无论你怎么设计,你的系统一定要能容易地水平扩展。也就是说,你的整个数据流中,所有的环节都要能够水平扩展。这样,当你的系统有性能问题时,“加30倍的服务器”才不会被人讥笑。

1)上述的技术不是一朝一夕能搞定的,没有长期的积累,基本无望。我们可以看到,无论你用哪种都会引发一些复杂性,设计总是在做一种权衡。

2)集中式的卖票很难搞定,使用上述的技术可以让订票系统能有几佰倍的性能提升。而在各个省市建分站,分开卖票,是能让现有系统性能有质的提升的最好方法

3)春运前夕抢票且票量供远小于求这种业务模式是相当变态的,让几千万甚至上亿的人在某个早晨的8点钟同时登录同时抢票的这种业务模式是变态中的变态。业务形态的变态决定了无论他们怎么办干一定会被骂。

4)为了那么一两个星期而搞那么大的系统,而其它时间都在闲着,有些可惜了,这也就是铁路才干得出来这样的事了。

 

全文转载自:http://coolshell.cn/articles/6470.html

 青春就应该这样绽放  游戏测试:三国时期谁是你最好的兄弟!!  你不得不信的星座秘密
from 互联网旁观者: http://blog.sina.com.cn/s/blog_72995dcc01017dsx.html

Written by cwyalpha

九月 28, 2012 at 9:53 上午

发表在 Uncategorized

Thought this was cool: (地基工)常用正则表达式整理(一)

leave a comment »

     摘要: 常用正则表达式整理,权威整理如下:  阅读全文

点点滴滴 2012-09-20 10:42 发表评论


from C++博客_Kevin Lynx: http://www.cppblog.com/ming81/archive/2012/09/20/191359.html

Written by cwyalpha

九月 20, 2012 at 4:41 上午

发表在 Uncategorized

Thought this was cool: IPHONE的开发,从0学起

leave a comment »

      乔布斯虽故,但IPHONE依旧是他留给这个世界最伟大的财富。一直IPHONE以不可阻挡之势雄称智能手机之王。NOKIA、三星、索爱等手机巨头则渐渐淡出消费者的视线。在这IPHONE已经普及的社会,作为一名程序员,不学IPHONE开发,感觉就如2000年不学C#一样的可惜。
      虽说现在IPHONE普及,但IPHONE的开发资料缺少、搭建IPHONE的开发环境问题多多,部署、调试更是前所未及,更要命的是一门新的语言成为一道最大的入门坎,使得很多精打细算的程序员都抓不着头脑。
      而新运营的一个旅游网站-驴家网【http://www.lvjia.cn】,又刚刚好需要支持移动设备应用开发,硬着头皮,打开百度,输入了【IPHONE开发】,开始了为期一个月的学习。。。。
      如今关于IPHONE的应用开发,已经上手,用你手头的PC机,拿一台越狱的IPHONE实机调试,而不用付苹果一分钱,那种感觉我相信是许多和我一样从0做起的程序员所需要的。在接下来的博客中,我将把这一个月所经历的和大家一同分享。
      驴家网官方博客已经通过新浪认证,有兴趣的朋友也可以加粉。感觉该博客还是挺趣味的。至少,普及下旅游知识,呵呵。http://e.weibo.com/u/2757677831

OUR!!CPP 2012-09-20 10:12 发表评论


from C++博客_Kevin Lynx: http://www.cppblog.com/ourvc/archive/2012/09/20/191350.html

Written by cwyalpha

九月 20, 2012 at 4:41 上午

发表在 Uncategorized

Thought this was cool: 【段子】钓鱼岛是中国的!

leave a comment »

“钓鱼岛是中国的!我买肉。” “勿忘国耻!你要买多少?”“驱除鞑虏,恢复中华!我买一斤。”“犯我强汉者,虽远必猪!你要切哪种?”“理智爱国!就来点五花的吧。”(xie107)

打喷嚏链接:http://www.dapenti.com/blog/more.asp?name=xilei&id=66964

喷嚏图卦微信:penti_tugua

每天网络精华尽在【喷嚏图卦】       喷嚏网官方新浪围脖
from 喷嚏网—-阅读、发现和分享:8小时外的健康生活! 之 [铂程斋]: http://www.dapenti.com/blog/more.asp?name=xilei&id=66964

Written by cwyalpha

九月 19, 2012 at 3:28 下午

发表在 Uncategorized

Thought this was cool: 客户端的IP地址伪造、CDN、反向代理、获取的那些事儿

leave a comment »

20120917 @郑昀汇总

外界流传的JAVA/PHP服务器端获取客户端IP都是这么取的:
伪代码:
1)ip = request.getHeader(“X-FORWARDED-FOR
“)
    可伪造,参考附录A
2)如果该值为空或数组长度为0或等于”unknown”,那么:
ip = request.getHeader(“Proxy-Client-IP”)
3)如果该值为空或数组长度为0或等于”unknown”,那么:
ip = request.getHeader(“WL-Proxy-Client-IP”)
4)如果该值为空或数组长度为0或等于”unknown”,那么:
ip = request.getHeader(“HTTP_CLIENT_IP”)
    可伪造
5)如果该值为空或数组长度为0或等于”unknown”,那么:
ip = request.getRemoteAddr
()
    可对于匿名代理服务器,可隐匿原始ip,参考附录B
 
之所以搞这么麻烦,是因为存在很多种网络结构,如 Nginx+Resin、Apache+WebLogic、Squid+Nginx。下面挨个儿讲一下。
郑昀 :△
首先,明确一下,Nginx 配置一般如下所示:
              location / {

                       proxy_pass       http://yourdomain.com;

                       proxy_set_header   Host             $host;

                       proxy_set_header   X-Real-IP        $remote_addr;

                       proxy_set_header   X-Forwarded-For  $proxy_add_x_forwarded_for;

              }
注意看红色字体,这些配置与下面的闯关拿IP有关。
 
———————————————————————————————
——第一关|X-Forwarded-For :背景——
这是一个 Squid 开发的字段,并非 RFC 标准。
简称 XFF 头
,只有在通过了 HTTP 代理或者负载均衡服务器时才会添加该项。在 Squid 开发文档中可以找到该项的详细介绍。
XFF 格式如下:
X-Forwarded-For: client1, proxy1, proxy2
可以看出,XFF 头信息可以有多个,中间用逗号分隔,第一项为真实的客户端ip,剩下的就是曾经经过的代理或负载均衡服务器的ip地址。
 
——第一关|X-Forwarded-For :场景=客户端–CDN–Nginx——
当用户请求经过 CDN 后到达 Nginx 负载均衡服务器时,其 XFF 头信息应该为 “客户端IP,CDN的IP”。
一般情况下CDN服务商出于自身安全考虑会将屏蔽CDN的ip,只保留客户端ip。
那么请求头到达 Nginx 时:
  • 在默认情况下,Nginx 并不会对 XFF 头做任何处理
    • 此时 Nginx 后面的 Resin/Apache/Tomcat 通过 request.getHeader(“X-FORWARDED-FOR”) 获得的ip仍然是原始ip
  • 当 Nginx 设置 X-Forwarded-For 等于 $proxy_add_x_forwarded_for 时:
    • 如果从CDN过来的请求没有设置 XFF 头(通常这种事情不会发生),XFF 头为 CDN 的ip
      • 此时相对于 Nginx 来说,客户端就是 CDN 
    • 如果 CDN 设置了 XFF 头,我们这里又设置了一次,且值为$proxy_add_x_forwarded_for 的话:
      • XFF 头为“客户端IP,Nginx负载均衡服务器IP”
        ,这样取第一个值即可
      • 这也就是大家所常见的场景!
https://i1.wp.com/images.cnblogs.com/cnblogs_com/zhengyun_ustc/255879/o_client-cdn-nginx-resin.png
综上所述,XFF 头在上图的场景,Resin 通过 request.getHeader(“X-FORWARDED-FOR”) 获得的ip字符串,做一个split,第一个元素就是原始ip。
那么,XFF 头可以伪造吗?
 
——第一关|X-Forwarded-For :伪造——
可以伪造。
XFF 头仅仅是 HTTP Headers 中的一分子,自然是可以随意增删改的。如附录A所示。
很多投票系统都有此漏洞,它们简单地取 XFF 头中定义的ip地址设置为来源地址,因此第三方可以伪造任何ip投票。
 
———————————————————————————————
——第二和第三关|Proxy-Client-IP/WL-
Proxy-Client-IP
 :背景——
Proxy-Client-IP 字段和 WL-Proxy-Client-IP 字段只在 Apache(Weblogic Plug-In Enable)+WebLogic 搭配下出现,其中“WL” 就是 WebLogic 的缩写。
即访问路径是:
Client -> Apache WebServer + Weblogic http plugin -> Weblogic Instances
所以这两关对于我们来说仅仅是兼容而已,怕你突然把 Nginx+Resin 换成 Apache+WebLogic 。
也可以直接忽略这两个字段。
 
———————————————————————————————
——第四关|HTTP-Client-IP
 :背景——
HTTP_CLIENT_IP 是代理服务器发送的HTTP头。
很多时候 Nginx 配置中也并没有下面这项:
proxy_set_header HTTP_CLIENT_IP $remote_addr;
所以本关也可以忽略。
郑昀 :△
———————————————————————————————
——第五关| request.getRemoteAddr() :背景——
从 request.getRemoteAddr() 函数的定义看:
    Returns the Internet Protocol (IP) address of the client or last proxy that sent the request. 

实际上,REMOTE_ADDR 是客户端跟服务器“握手”时的IP,但如果使用了“匿名代理”,REMOTE_ADDR 将显示代理服务器的ip,或者最后一个代理服务器的ip。请参考附录B。 

 
综上,
java/php 里拿到的ip地址可能是伪造的或代理服务器的ip。
 
郑昀 :△
+++附录A XFF 与 Nginx 配置的测试用例+++
测试环境: nginx+resin

内网IP:172.16.100.10

客户端IP:123.123.123.123

测试页面: test.jsp

<%

out.println(“x-forwarded-for: ” + request.getHeader(“x-forwarded-for”));

out.println(“remote hosts: ” + request.getRemoteAddr());

%>

nginx 配置一
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
 
wget测试
wget -O aa –header=”X-Forwarded-For:192.168.0.1″ “http://test.com/test.jsp&#8221;
页面返回结果:
x-forwarded-for: 192.168.0.1, 123.123.123.123
remote hosts: 172.16.100.10
 
curl测试
curl -H “X-Forwarded-For:192.168.0.1” “http://test.com/test.jsp&#8221;
x-forwarded-for: 192.168.0.1, 123.123.123.123
remote hosts: 172.16.100.10

nginx 配置二

proxy_set_header X-Real-IP $remote_addr;

proxy_set_header X-Forwarded-For $remote_addr;

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

wget测试:

wget -O aa –header=”X-Forwarded-For:192.168.0.1″ “http://test.com/test.jsp&#8221;

页面返回结果:

x-forwarded-for: 123.123.123.123

remote hosts: 172.16.100.10

curl测试

curl -H “X-Forwarded-For:192.168.0.1” “http://test.com/test.jsp&#8221;

x-forwarded-for: 123.123.123.123

remote hosts: 172.16.100.10

测试结果:

1、配置  

proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

增加了一个真实ip X-Forwarded-For,并且顺序是增加到了“后面”。

2、配置  

proxy_set_header X-Forwarded-For $remote_addr;

清空了客户端伪造传入的X-Forwarded-For,

保证了使用 request.getHeader(“x-forwarded-for”) 获取的ip为真实ip,

或者用“,”分隔,截取 X-Forwarded-For 最后的值。
 
+++附录B 搜狗浏览器高速模式的测试用例+++
访问路径:
搜狗浏览器“高速”模式(即使用代理)–>LVS–>Apache
获得的值为:
x-forwarded-for:180.70.92.43   (即真实ip)
Proxy-Client-IP:null
WL-Proxy-Client-IP:null 
getRemoteAddr:123.126.50.185  (即搜狗代理ip)
 
 
×××参考资源:×××
1,http://bbs.linuxtone.org/thread-9050-1-1.html
2,http://hi.baidu.com/thinkinginlamp/item/e2cf05263eb4d18e6e2cc3e6
3,http://bbs.chinaunix.net/thread-3659453-1-1.html
 
赠图2枚:
https://i2.wp.com/ww4.sinaimg.cn/bmiddle/64937c56jw1dwg7insv8fj.jpg
https://i2.wp.com/ww3.sinaimg.cn/bmiddle/4c80d8e7jw1dlt7ho0s4tj.jpg

本文链接


0

   

0

IT 牛人博客聚合网站(udpwork.com) 聚合
|
评论: 0
|
10000+ 本编程/Linux PDF/CHM 电子书下载

from IT牛人博客聚合网站: http://www.udpwork.com/item/8135.html

Written by cwyalpha

九月 19, 2012 at 3:28 下午

发表在 Uncategorized